Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using SIFTS data for renumbering residues to match the Uniprot sequence resids #110

Open
mrauha opened this issue Aug 18, 2022 · 1 comment

Comments

@mrauha
Copy link

mrauha commented Aug 18, 2022

Hi all,

stumbled upon this paper describing the mapping of PDB residue id's to the ones in the sequence deposited in Uniprot:

  • Choudhary, P.; Anyango, S.; Berrisford, J.; Varadi, M.; Tolchard, J.; Velankar, S. Unified Access to up-to-Date Residue-Level Annotations from UniProt and Other Biological Databases for PDB Data via PDBx/mmCIF Files. bioRxiv, 2022, 2022.08.10.503473. https://doi.org/10.1101/2022.08.10.503473.

Frustrated by the inconsistencies in numbering, I'm writing some code to output pdb's with these Uniprot sequence matching id's, and using biopandas for the crunching.

The mmCIF's with the mapped residues can be downloaded from the url:

https://www.ebi.ac.uk/pdbe/entry-files/download/{pdb_id}_updated.cif"

The CIF file is nicely read with the mmCIF parser. The resid matching the one in Uniprot is in the column pdbx_sifts_xref_db_num, giving None for those without mapping to sequence, eg. ligands and the UNK's.

This paper/python code/webserver describes a similar thing using the SIFTS:

  • Faezov, B.; Dunbrack, R. L., Jr. PDBrenum: A Webserver and Program Providing Protein Data Bank Files Renumbered according to Their UniProt Sequences. PLoS One 2021, 16 (7), e0253411. https://doi.org/10.1371/journal.pone.0253411.

For the residues without a mapping, the residues are renumbered using an offset of 5k/50k so that there's no overlap with the new resids of amino acids.

However, occasionally a part of the chain is are UNK's, so I will implemented a way to use continuous numbering wrt the Uniprot mapped resids for these.

Work in progress - if there's an already existing way to do this, let me know :)

@mrauha mrauha changed the title Using Using SIFTS data for renumbering residues to match the Uniprot sequence resids Aug 18, 2022
@Ruibin-Liu
Copy link
Contributor

The missing residues are not matched, which is a caveat for some uses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants