代码之家  ›  专栏  ›  技术社区  ›  user977828

awk拆分并颠倒顺序

awk
  •  0
  • user977828  · 技术社区  · 3 年前

    我有以下几行是用制表符分隔的。

    NIATv7_g10470.t1    XP_019227081.1  100.0   878 0   0   1   878 1   878 0.0e+00 1599.7  99.9    MELKVSSPKPVFSTSDCNSDPEEKEISEDXXXXXXXXXXXXXTRSQSTETEALEPALRRPFRKRNKPFENGHPYQEGDSHSSDTRFGKRRGMGSFSRTPSDSYQMMRLNQSLSGHAAPGRGRGRESGAWGPCESRFSTIDIASQFVPQGPINPLLYTGRGPQNVSSGQGASWNAFGIVPGIPNGGLDTLHTLGLQGRLRTSLNPAMSMGIPRQRCRDFEERGFCLRGDMCPLEHGVNRIVVEDVQSLSKFNLPVSLPGAHTLGPATAQGPLPAISPSSSLANKALHNKSINPPVIDNGLGLTDTFGGGSVSGGADFYDPDQPLWSNDHPENSAALLDVNRSKIDDTGPMLDADSSDQDQVALCDGFKLERLVRDAGAASGSQSVWERTSRSKHKLQSFNSTQGINRHGKQTNVDTIDPQMVESSSEPQSSSGRNMRKPSQKALRTLFVSGVPQKDNKPEALLSHFQKFGEVIDIYIPMNGERAFVQFSKREEAEAALKAPDAVMGNRFIKLFWANRDSIMDNGTSGSSIFPLAPRGGTPSTVPPHLLFPHKRKDNLQTVAGKTAEQACGSVTVAPLATSDLPKPVAQNGLKTTPPLKKKLETLELLKEEMRXXXXXXXXXXXXXXXXXXXXXKQAVGVKDEAAPDQAMNKPKGGGTVSNSGXXXXXXXXXXXXXXXXXXXXXXXXXSRSTENAEPTCSKLSLTVAMHEASNLKQSIRPLAPVGAPFILNRYKLDNRPTTFKILPPLPSALANVDVLKEHFSTFGDPPSVELEDLEPKDCNDGSEVQNTSARISFRSRRSAERAFLNGKSWQGQILQLMWVQSSNPAKDVGVGENVTPASKQPSDANGQSNARNGVAGLPEGSVAGNHEPDNQGRREDE  MELKVSSPKPVFSTSDCNSDPEEKEISEDXXXXXXXXXXXXXTRSQSTETEALEPALRRPFRKRNKPFENGHPYQEGDSHSSDTRFGKRRGMGSFSRTPSDSYQMMRLNQSLSGHAAPGRGRGRESGAWGPCESRFSTIDIASQFVPQGPINPLLYTGRGPQNVSSGQGASWNAFGIVPGIPNGGLDTLHTLGLQGRLRTSLNPAMSMGIPRQRCRDFEERGFCLRGDMCPLEHGVNRIVVEDVQSLSKFNLPVSLPGAHTLGPATAQGPLPAISPSSSLANKALHNKSINPPVIDNGLGLTDTFGGGSVSGGADFYDPDQPLWSNDHPENSAALLDVNRSKIDDTGPMLDADSSDQDQVALCDGFKLERLVRDAGAASGSQSVWERTSRSKHKLQSFNSTQGINRHGKQTNVDTIDPQMVESSSEPQSSSGRNMRKPSQKALRTLFVSGVPQKDNKPEALLSHFQKFGEVIDIYIPMNGERAFVQFSKREEAEAALKAPDAVMGNRFIKLFWANRDSIMDNGTSGSSIFPLAPRGGTPSTVPPHLLFPHKRKDNLQTVAGKTAEQACGSVTVAPLATSDLPKPVAQNGLKTTPPLKKKLETLELLKEEMRXXXXXXXXXXXXXXXXXXXXXKQAVGVKDEAAPDQAMNKPKGGGTVSNSGXXXXXXXXXXXXXXXXXXXXXXXXXSRSTENAEPTCSKLSLTVAMHEASNLKQSIRPLAPVGAPFILNRYKLDNRPTTFKILPPLPSALANVDVLKEHFSTFGDPPSVELEDLEPKDCNDGSEVQNTSARISFRSRRSAERAFLNGKSWQGQILQLMWVQSSNPAKDVGVGENVTPASKQPSDANGQSNARNGVAGLPEGSVAGNHEPDNQGRREDE  MELKVSSPKPVFSTSDCNSDPEEKEISEDXXXXXXXXXXXXXTRSQSTETEALEPALRRPFRKRNKPFENGHPYQEGDSHSSDTRFGKRRGMGSFSRTPSDSYQMMRLNQSLSGHAAPGRGRGRESGAWGPCESRFSTIDIASQFVPQGPINPLLYTGRGPQNVSSGQGASWNAFGIVPGIPNGGLDTLHTLGLQGRLRTSLNPAMSMGIPRQRCRDFEERGFCLRGDMCPLEHGVNRIVVEDVQSLSKFNLPVSLPGAHTLGPATAQGPLPAISPSSSLANKALHNKSINPPVIDNGLGLTDTFGGGSVSGGADFYDPDQPLWSNDHPENSAALLDVNRSKIDDTGPMLDADSSDQDQVALCDGFKLERLVRDAGAASGSQSVWERTSRSKHKLQSFNSTQGINRHGKQTNVDTIDPQMVESSSEPQSSSGRNMRKPSQKALRTLFVSGVPQKDNKPEALLSHFQKFGEVIDIYIPMNGERAFVQFSKREEAEAALKAPDAVMGNRFIKLFWANRDSIMDNGTSGSSIFPLAPRGGTPSTVPPHLLFPHKRKDNLQTVAGKTAEQACGSVTVAPLATSDLPKPVAQNGLKTTPPLKKKLETLELLKEEMRXXXXXXXXXXXXXXXXXXXXXKQAVGVKDEAAPDQAMNKPKGGGTVSNSGXXXXXXXXXXXXXXXXXXXXXXXXXSRSTENAEPTCSKLSLTVAMHEASNLKQSIRPLAPVGAPFILNRYKLDNRPTTFKILPPLPSALANVDVLKEHFSTFGDPPSVELEDLEPKDCNDGSEVQNTSARISFRSRRSAERAFLNGKSWQGQILQLMWVQSSNPAKDVGVGENVTPASKQPSDANGQSNARNGVAGLPEGSVAGNHEPDNQGRREDE* MELKVSSPKPVFSTSDCNSDPEEKEISEDDDDDRNHKHRRKDTRSQSTETEALEPALRRPFRKRNKPFENGHPYQEGDSHSSDTRFGKRRGMGSFSRTPSDSYQMMRLNQSLSGHAAPGRGRGRESGAWGPCESRFSTIDIASQFVPQGPINPLLYTGRGPQNVSSGQGASWNAFGIVPGIPNGGLDTLHTLGLQGRLRTSLNPAMSMGIPRQRCRDFEERGFCLRGDMCPLEHGVNRIVVEDVQSLSKFNLPVSLPGAHTLGPATAQGPLPAISPSSSLANKALHNKSINPPVIDNGLGLTDTFGGGSVSGGADFYDPDQPLWSNDHPENSAALLDVNRSKIDDTGPMLDADSSDQDQVALCDGFKLERLVRDAGAASGSQSVWERTSRSKHKLQSFNSTQGINRHGKQTNVDTIDPQMVESSSEPQSSSGRNMRKPSQKALRTLFVSGVPQKDNKPEALLSHFQKFGEVIDIYIPMNGERAFVQFSKREEAEAALKAPDAVMGNRFIKLFWANRDSIMDNGTSGSSIFPLAPRGGTPSTVPPHLLFPHKRKDNLQTVAGKTAEQACGSVTVAPLATSDLPKPVAQNGLKTTPPLKKKLETLELLKEEMRKKQEMLEQKRNEFRRKLDKLEKQAVGVKDEAAPDQAMNKPKGGGTVSNSGKVENSSPVEPSNTVSSPPSEATPDSSRSTENAEPTCSKLSLTVAMHEASNLKQSIRPLAPVGAPFILNRYKLDNRPTTFKILPPLPSALANVDVLKEHFSTFGDPPSVELEDLEPKDCNDGSEVQNTSARISFRSRRSAERAFLNGKSWQGQILQLMWVQSSNPAKDVGVGENVTPASKQPSDANGQSNARNGVAGLPEGSVAGNHEPDNQGRREDE  XP_019227081.1 PREDICTED: zinc finger CCCH domain-containing protein 41-like [Nicotiana attenuata]
    

    我使用这个awk命令来简化上面的行:

    > awk 'BEGIN { FS = "\t" } ;{print $1","$18}' NIATT_r2.0.aa.combined_nr_XPonly.best_hit_addNoXP | head
    NIATv7_g10470.t1,XP_019227081.1 PREDICTED: zinc finger CCCH domain-containing protein 41-like [Nicotiana attenuata]
    

    我想分开 $18 首次太空飞行。

    XP_019227081.1 PREDICTED: zinc finger CCCH domain-containing protein 41-like [Nicotiana attenuata]
    

    这个分割的输出我想通过一个选项卡来交换和分离。

    PREDICTED: zinc finger CCCH domain-containing protein 41-like [Nicotiana attenuata]   XP_019227081.1 
    

    最后,我想合并 $1 以获得以下输出:

    NIATv7_g10470.t1,PREDICTED: zinc finger CCCH domain-containing protein 41-like [Nicotiana attenuata]       XP_019227081.1 
    

    这怎么可能?

    0 回复  |  直到 3 年前
        1
  •  1
  •   Renaud Pacalet    3 年前

    如果你使用GNU awk ,带有 gensub 分机,您可以尝试:

    awk -F'\t' '{print $1 "," gensub(/(\S+)\s+(.*)/, "\\2\t\\1", "1", $18)}'
    
        2
  •  0
  •   tripleee    3 年前

    这个 split 函数允许您在任意正则表达式上拆分一个值。

    awk -F '\t' '{ split($18, fields, / /);
        print $1 "," substr($18, length(fields[1]) + 2) "\t" fields[1]  }' NIATT_r2.0.aa.combined_nr_XPonly.best_hit_addNoXP
    

    或者,您可以使用 sub() (或者在支持它的Awk方言中, gsub() )直接对值执行正则表达式替换。