代码之家  ›  专栏  ›  技术社区  ›  CodeNoob

jq:为每个对象收集json数据

  •  0
  • CodeNoob  · 技术社区  · 6 年前

    我发现我的标题很不清楚,但是我不知道如何更好地重写它,所以请随意编辑它!


    数据
    我有以下(简化)JSON:

    [
      {
        "genes_id": "eco:b0002",
        "entry_id": "b0002",
        "division": "CDS",
        "organism": "Escherichia coli K-12 MG1655",
        "organism_code": "eco",
        "organism_id": "T00007",
        "name": "thrA",
        "names": [
          "thrA"
        ],
        "definition": "(RefSeq) Bifunctional aspartokinase/homoserine dehydrogenase 1",
        "eclinks": [
    
        ],
        "orthologs": {
          "K12524": "bifunctional aspartokinase / homoserine dehydrogenase 1 [EC:2.7.2.4 1.1.1.3]"
        },
        "pathways": {
          "eco00260": "Glycine, serine and threonine metabolism",
          "eco00261": "Monobactam biosynthesis",
          "eco00270": "Cysteine and methionine metabolism",
          "eco00300": "Lysine biosynthesis",
          "eco01100": "Metabolic pathways",
          "eco01110": "Biosynthesis of secondary metabolites",
          "eco01120": "Microbial metabolism in diverse environments",
          "eco01130": "Biosynthesis of antibiotics",
          "eco01230": "Biosynthesis of amino acids"
        },
        "modules": {
          "eco_M00016": "Lysine biosynthesis, succinyl-DAP pathway, aspartate => lysine",
          "eco_M00017": "Methionine biosynthesis, apartate => homoserine => methionine",
          "eco_M00018": "Threonine biosynthesis, aspartate => homoserine => threonine"
        },
        "classes": [
    
        ],
        "position": "337..2799",
        "chromosome": null,
        "gbposition": "337..2799",
        "motifs": {
          "Pfam": [
            "Homoserine_dh",
            "AA_kinase",
            "NAD_binding_3",
            "ACT_7",
            "ACT",
            "Sacchrp_dh_NADP"
          ]
        },
        "dblinks": {
          "NCBI-GeneID": [
            "945803"
          ],
          "NCBI-ProteinID": [
            "NP_414543"
          ],
          "Pasteur": [
            "thrA"
          ],
          "RegulonDB": [
            "ECK120000987"
          ],
          "ECOCYC": [
            "EG10998"
          ],
          "ASAP": [
            "ABE-0000008"
          ],
          "UniProt": [
            "P00561"
          ]
        }
      },
      {
        "genes_id": "eco:b0003",
        "entry_id": "b0003",
        "division": "CDS",
        "organism": "Escherichia coli K-12 MG1655",
        "organism_code": "eco",
        "organism_id": "T00007",
        "name": "thrB",
        "names": [
          "thrB"
        ],
        "definition": "(RefSeq) homoserine kinase",
        "eclinks": [
    
        ],
        "orthologs": {
          "K00872": "homoserine kinase [EC:2.7.1.39]"
        },
        "pathways": {
          "eco00260": "Glycine, serine and threonine metabolism",
          "eco01100": "Metabolic pathways",
          "eco01110": "Biosynthesis of secondary metabolites",
          "eco01120": "Microbial metabolism in diverse environments",
          "eco01230": "Biosynthesis of amino acids"
        },
        "modules": {
          "eco_M00018": "Threonine biosynthesis, aspartate => homoserine => threonine"
        },
        "classes": [
    
        ],
        "position": "2801..3733",
        "chromosome": null,
        "gbposition": "2801..3733",
        "motifs": {
          "Pfam": [
            "GHMP_kinases_N",
            "GHMP_kinases_C"
          ]
        },
        "dblinks": {
          "NCBI-GeneID": [
            "947498"
          ],
          "NCBI-ProteinID": [
            "NP_414544"
          ],
          "Pasteur": [
            "thrB"
          ],
          "RegulonDB": [
            "ECK120000988"
          ],
          "ECOCYC": [
            "EG10999"
          ],
          "ASAP": [
            "ABE-0000010"
          ],
          "UniProt": [
            "P00547"
          ]
        }
      }
    ]
    

    期望输出
    这是两个对象的数组。我对 genes_id pathways 对于两个对象,并希望获得一个以制表符分隔的文件,其中包含以下内容:

    eco:b0002   eco00260    Glycine, serine and threonine metabolism
    eco:b0002   eco00261    Monobactam biosynthesis
    eco:b0002   eco00270    Cysteine and methionine metabolism
    eco:b0002   eco00300    Lysine biosynthesis
    eco:b0002   eco01100    Metabolic pathways
    eco:b0002   eco01110    Biosynthesis of secondary metabolites
    eco:b0002   eco01120    Microbial metabolism in diverse environments
    eco:b0002   eco01130    Biosynthesis of antibiotics
    eco:b0002   eco01230    Biosynthesis of amino acids
    eco:b0003   eco00260    Glycine, serine and threonine metabolism
    eco:b0003   eco01100    Metabolic pathways
    eco:b0003   eco01110    Biosynthesis of secondary metabolites
    eco:b0003   eco01120    Microbial metabolism in diverse environments
    eco:b0003   eco01230    Biosynthesis of amino acids
    

    我发现了什么
    我知道有可能以如下格式提取数据:

    eco:b0002: list of pathways and ids
    eco:b0003: list of pathways and ids
    

    但是我想 传播 到单个行的路径,如上面的示例中所示。我找不到任何关于如何使用JQ实现这一点的信息,因此怀疑这是否真的可行。因此,如果可能,如何使用JQ实现这一点?

    1 回复  |  直到 6 年前
        1
  •  1
  •   peak    6 年前

    调用:jq-rf totsv.jq input.json

    程序(totsv.jq):

    .[]
    | .genes_id as $id
    | .pathways
    | to_entries[]
    | [$id, .key, .value]
    | @tsv
    

    TSV是一个不错的选择(就像JQ一样)!