我已从下载了一个示例数据集
here
{
"id": "4cd223df721b722b1c40689caa52932a41fcc223",
"title": "Knowledge-rich, computer-assisted composition of Chinese couplets",
"paperAbstract": "Recent research effort in poem composition has focused on the use of automatic language generation...",
"entities": [
"Conformance testing",
"Natural language generation",
"Natural language processing",
"Parallel computing",
"Stochastic grammar",
"Web application"
],
"s2Url": "https://semanticscholar.org/paper/4cd223df721b722b1c40689caa52932a41fcc223",
"s2PdfUrl": "",
"pdfUrls": [
"https://doi.org/10.1093/llc/fqu052"
],
"authors": [
{
"name": "John Lee",
"ids": [
"3362353"
]
},
"..."
],
"inCitations": [
"c789e333fdbb963883a0b5c96c648bf36b8cd242"
],
"outCitations": [
"abe213ed63c426a089bdf4329597137751dbb3a0",
"..."
],
"year": 2016,
"venue": "DSH",
"journalName": "DSH",
"journalVolume": "31",
"journalPages": "152-163",
"sources": [
"DBLP"
],
"doi": "10.1093/llc/fqu052",
"doiUrl": "https://doi.org/10.1093/llc/fqu052",
"pmid": ""
}
最终我需要和
paperAbsrtract
filename = "sample-S2-records"
df = pd.read_json(filename, lines=True)
df.head()
这显示了所有
doi
和
doiUrl
列为空。
另外,如果我只选择抽象列并检查标题,我会看到5行中有2行是空的
abstract = df['paperAbstract']
abstract.head()
0
1 The search for new administrators in complex s...
2 The human N-formyl peptide receptor (FPR) is a...
3 Serum CA 19-9 (2-3 sialyl Le(a)) is a marker o...
4
Name: paperAbstract, dtype: object
我错过了什么?有什么建议吗?