Sunday, 3 May 2015

structural biology - How can I pare down a PDB file in Python to only include specific residues?

ProDy works quite well, especially from within an existing Python script.



The following code takes an existing PDB file, performs some selection query on it, then saves it to another file.



import prody

def pdbsubset(inpdb, outpdb, selection):
with open(inpdb) as protf:
prot = prody.parsePDBStream(protf)
atoms = prot.select(selection)
prody.writePDB(outpdb, atoms)


An example selection query builder



  • residues is a list e.g. ['A12', 'A39'] with each element in the form <chain><residue number>. They were captured from the command line using argparse with



        parser.add_argument('-i', '--residues', nargs='+')


    • so you would specify -i A12 A39 or whatever.


  • pdb and outpdb are file paths

  • radius is the distance in angstroms to expand the selection by.

reslist = ["(chid {0} and resid {1})".format(res[0], res[1:]) for res in residues]
selector = 'within {0} of ({1})'.format(radius, ' or '.join(reslist))

# and running it:
pdbsubset(pdb, outpdb, selector)


The documentation on ProDy selection queries is not the most straightforward, but fairly analogous to PyMol, so doable.

No comments:

Post a Comment