代码之家 › 专栏 › 技术社区 › nad

一系列JSON对象到数据帧的转换

pandas json python

nad · 技术社区 · 6 年前

我已从下载了一个示例数据集 here

{
  "id": "4cd223df721b722b1c40689caa52932a41fcc223",
  "title": "Knowledge-rich, computer-assisted composition of Chinese couplets",
  "paperAbstract": "Recent research effort in poem composition has focused on the use of automatic language generation...",
  "entities": [
    "Conformance testing",
    "Natural language generation",
    "Natural language processing",
    "Parallel computing",
    "Stochastic grammar",
    "Web application"
  ],
  "s2Url": "https://semanticscholar.org/paper/4cd223df721b722b1c40689caa52932a41fcc223",
  "s2PdfUrl": "",
  "pdfUrls": [
    "https://doi.org/10.1093/llc/fqu052"
  ],
  "authors": [
    {
      "name": "John Lee",
      "ids": [
        "3362353"
      ]
    },
    "..."
  ],
  "inCitations": [
    "c789e333fdbb963883a0b5c96c648bf36b8cd242"
  ],
  "outCitations": [
    "abe213ed63c426a089bdf4329597137751dbb3a0",
    "..."
  ],
  "year": 2016,
  "venue": "DSH",
  "journalName": "DSH",
  "journalVolume": "31",
  "journalPages": "152-163",
  "sources": [
    "DBLP"
  ],
  "doi": "10.1093/llc/fqu052",
  "doiUrl": "https://doi.org/10.1093/llc/fqu052",
  "pmid": ""
}

最终我需要和 paperAbsrtract

filename = "sample-S2-records"
df = pd.read_json(filename, lines=True) 
df.head()

这显示了所有 doi 和 doiUrl 列为空。

另外,如果我只选择抽象列并检查标题,我会看到5行中有2行是空的

abstract = df['paperAbstract']
abstract.head()

0                                                     
1    The search for new administrators in complex s...
2    The human N-formyl peptide receptor (FPR) is a...
3    Serum CA 19-9 (2-3 sialyl Le(a)) is a marker o...
4                                                     
Name: paperAbstract, dtype: object

我错过了什么?有什么建议吗?

1 回复 | 直到 6 年前

Liudvikas Akelis 6 年前

我查看了你的数据样本,我认为你得到了正确的结果。如果我们手工解析JSON:

import json
filename = "sample-S2-records"
with open(filename, 'r') as f:
    d = [json.loads(x) for x in f]

>>> d[0]['paperAbstract']
''

所以看起来像是第一行 paperAbstract 字段为空。

旁白:我认为这个问题需要解决,我怀疑它对其他人是否有帮助

推荐文章

lonix · 使用sed从JSON中提取非贪婪正则表达式

1 年前

Ishaan Adarsh · 未知的META规范,无法验证。[规范v1.0.1]

1 年前

Henry · 使用Python将json重新格式化为键值对

2 年前

eymentakak · json字典类型错误:字符串索引必须是整数

2 年前

igbins09 · 在shell bash脚本中使用jq将单行JSON转换为csv

2 年前

Gnome-Improvement713 · POST正文未使用JSON通过cURL发送

2 年前

Julien T-Pro · 如何在python中删除数组对象中的空列表?

2 年前

Vodka · 如何将单个json对象转换为json数组?

2 年前

Alper · 从json转换为数据帧?

2 年前

user19251203 · ReactJs:Uncaught TypeError:无法读取未定义的属性(读取“0”)

2 年前