代码之家  ›  专栏  ›  技术社区  ›  F.Lira

SeqIO公司。解析python:features表期间过早结束

  •  0
  • F.Lira  · 技术社区  · 7 年前

    以前有人有这个问题吗?关于原因有什么建议吗?

    脚本创建包含基因组序列的文件,但它出现在过程的末尾。

    我的脚本中的行

    File "scripts/list_ncbi_download_genome_vs_02.py", line 97, in <module>
        SeqIO.write(SeqIO.parse(genbank_file, "genbank"), genome_file, "fasta")
    

    出现的警告:

      File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/__init__.py", line 481, in write
        count = writer_class(fp).write_file(sequences)
      File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/Interfaces.py", line 209, in write_file
        count = self.write_records(records)
      File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/Interfaces.py", line 193, in write_records
        for record in records:
      File "/usr/lib/python2.7/dist-packages/Bio/SeqIO/__init__.py", line 600, in parse
        for r in i:
      File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 478, in parse_records
        record = self.parse(handle, do_features)
      File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 462, in parse
        if self.feed(handle, consumer, do_features):
      File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 434, in feed
        self._feed_feature_table(consumer, self.parse_features(skip=False))
      File "/usr/lib/python2.7/dist-packages/Bio/GenBank/Scanner.py", line 159, in parse_features
        raise ValueError("Premature end of line during features table")
    

    我可以接受这一点,但完成一个过程并不是那么美好,它会在之后出现。

    该文件可在以下网址下载: https://github.com/felipelira/files_to_test/blob/master/GCF_000302915.1_Pav631_1.0_genomic.gbff

    我的脚本中调用该命令的块是:

    ## rename and move files to the output directory created in the command line:
    genome_dict = {}
    genome_list = []
    for genbank_file in list_uncompressed:
        organism = genbank_file.split('/')[0]
        file_name = genbank_file.split('/')[-1]
        genome_file = organism +'_'+ file_name.split('_')[0] +'_'+ file_name.split('_')[1]+'.fna'
        genome_list.append(genome_file)
        genome_dict[genome_file.replace('.fna', '')] = organism
    #print genome_dict
        print "Dealing with GenBank record %s" % genome_file
        SeqIO.write(SeqIO.parse(genbank_file, "genbank"), os.path.join(outdir, genome_file), "fasta")
        print "Genome saved %s" % genome_file
    
    1 回复  |  直到 6 年前
        1
  •  0
  •   F.Lira    7 年前

    根据biostars发布的建议解决了问题。组织机构 https://www.biostars.org/p/289314/#289407

    菲利普拜耳的建议: https://www.biostars.org/u/4678/

    正常情况下,这应该可以工作(在我的系统上也可以)。在此之前,您是否正在写入脚本中的genbank\u文件?也许你没有 是否已关闭文件句柄,以便写入文件尚未同步?

    和a.zielezinski: https://www.biostars.org/u/4700/ 来自Bio import SeqIO

    l = ['GCF_000302915.1_Pav631_1.0_genomic.gbff']
    for genbank_file in l:
        fh = open(genbank_file)
        oh = open(genbank_file + '.fasta', 'w')
        for seq_record in SeqIO.parse(fh, 'genbank'):
            oh.write(seq_record.format('fasta'))
        oh.close()
        fh.close()