代码之家  ›  专栏  ›  技术社区  ›  GingerBadger

Biopython:RESEQ与pdb文件不匹配

  •  2
  • GingerBadger  · 技术社区  · 7 年前

    我有一个PDB文件,我需要提取它的剩余序列号( resseq RESEQ公司 [22, 23, ...] 然而,Biopython的 Bio.PDB 模块另有建议(输出也附在下面)。我想知道这是不是一个Biopython错误,或者我在理解PDB格式时有问题。

    ATOM      1  N   GLY A  22      78.171  89.858  59.231  1.00 21.24           N  
    ATOM      2  CA  GLY A  22      79.174  88.827  58.999  1.00 20.87           C  
    ATOM      3  C   GLY A  22      80.438  89.415  58.391  1.00 21.89           C  
    ATOM      4  O   GLY A  22      80.362  90.202  57.440  1.00 23.18           O  
    ATOM      5  N   LEU A  23      81.588  89.069  58.972  1.00 21.51           N  
    ATOM      6  CA  LEU A  23      82.895  89.555  58.527  1.00 20.80           C  
    ATOM      7  C   LEU A  23      83.288  89.020  57.162  1.00 22.41           C  
    ATOM      8  O   LEU A  23      82.889  87.923  56.788  1.00 22.93           O  
    ATOM      9  CB  LEU A  23      83.973  89.232  59.560  1.00 20.97           C  
    ATOM     10  CG  LEU A  23      84.225  87.818  60.062  1.00 13.32           C  
    ATOM     11  CD1 LEU A  23      85.448  87.888  60.939  1.00 15.24           C  
    ATOM     12  CD2 LEU A  23      83.035  87.258  60.829  1.00 12.21           C
    

    RESEQ公司 :

    ...
    for i in chain:
        print i.get_full_id()
    
    OUT:('pdb', 0, 'A', (' ', 2, ' '))
        ('pdb', 0, 'A', (' ', 3, ' '))
    ...
    
    1 回复  |  直到 7 年前
        1
  •  4
  •   BioGeek    7 年前

    来自以下文件: Bio.PDB.Entity.get_full_id

    def get_full_id(self):
        """Return the full id.
    
        The full id is a tuple containing all id's starting from
        the top object (Structure) down to the current object. A full id for
        a Residue object e.g. is something like:
    
        ("1abc", 0, "A", (" ", 10, "A"))
    
        This corresponds to:
    
        Structure with id "1abc"
        Model with id 0
        Chain with id "A"
        Residue with id (" ", 10, "A")
    
        The Residue id indicates that the residue is not a hetero-residue
        (or a water) because it has a blank hetero field, that its sequence
        identifier is 10 and its insertion code "A".
        """
        # The function implementation below here ...
    

    我假设你是在链的原子上迭代,而不是在残基上迭代,这就得到了完整的结果 id Atom Residue .

    如果将示例残留物保存在名为 struct.pdb 然后运行下面的代码,得到正确的 s

    >>> structure = PDBParser().get_structure('test', 'struct.pdb')
    >>> for residue in structure.get_residues():
    ...    print(residue.get_full_id())
    ('test', 0, 'A', (' ', 22, ' '))
    ('test', 0, 'A', (' ', 23, ' '))
    >>> resseqs = [residue.id[1] for residue in structure.get_residues()]
    >>> print(resseqs)
    [22, 23]