Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Biopandas PDB output formatting leads to a ton of segments when reading with MDAnalysis: reason and my quick fix #109

Open
mrauha opened this issue Aug 18, 2022 · 1 comment

Comments

@mrauha
Copy link

mrauha commented Aug 18, 2022

Hi all,

was a bit baffled opening biopandas PDB output with MDAnalysis. Instead of some dozen segments, I got thousands. Here's why & my hacky fix:

Biopandas outputs the rows in a following way:

ATOM  50786  CB  ASP q  96     219.123 233.404 332.880  1.00 97.39           C
ATOM  50787  N   PRO q  97     222.483 233.701 332.586  1.00 100.66           N

while in MDAnalysis expects this format:

ATOM  51419  O   UNK r 113     214.624 201.542 285.597  1.00 99.63           O
ATOM  51420  CB  UNK r 113     217.297 202.297 286.117  1.00100.32           C

Due to this formatting when B-factors have five numbers (>99.99), MDAnalysis parses the last digit of the B-factor to be the segid and uses them as chains, see the code for th eparser:
Line 297:

                segids.append(line[66:76].strip())

Lines 304-306:

        # If segids not present, try to use chainids
        if not any(segids):
            segids = chainids

As a quick fix, I commented out the last if statement in MDAnalysis.

@rasbt
Copy link
Member

rasbt commented Sep 3, 2022

Thanks for the note, I think I've heard about this issue before. I think there was a similar issue with the atom number that I fixed many years ago. I think this issue can be fixed in a similar fashion. Just need to find the time for this some time. Thanks for pointing it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants