Improve PDB formatting with incomplete Monomer info #7286
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduction
By default, the PDB writer uses the residue name UNL, and sets very reasonable atom names:
We can also set the residue info manually. If this information is complete, everything works:
However, if the residue info is incomplete (e.g., we set the residue name but not the atom name), the PDB output will be invalid. Most importantly, the line is too short, so that the later columns get into the wrong place:
This happens if either the atom name or the residue name is missing. But IMO, the more important case is the missing atom name, since enumerating the atom names is more difficult that setting a dummy residue name.
Changes
GetDefaultAtomNumber
.std::setw(4)
before the atom name and anstd::setw(3)
before the residue name, so that additional spaces will be printed before the atom or residue name if it is too short. (See below: I should still add a unit test for that). Again, this avoids incorrect column alignment. If the user specifies a short name like "C1", it will still not be aligned properly within the column, but the result will be parsed correctly by PyMol. (PDB specifies that the alignment of atom names depends on the length of the element symbol. To do that perfectly, we would have to parse the user-specified atom name).To do / Questions
GetDefaultAtomNumber
, since that code is now used in two different places. I only declared and defined this function in the PDBWriter.cpp file, since I don't believe that it can be useful anywhere else, and since there is no corresponding .h file. Is that ok?What do you think of this? Thanks in advance for your help :-)