Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding hydrogens with -p messes up the entire structure (not only residue numbers and names) #2677

Open
kalinni opened this issue Feb 28, 2024 · 15 comments · May be fixed by #2678
Open

Adding hydrogens with -p messes up the entire structure (not only residue numbers and names) #2677

kalinni opened this issue Feb 28, 2024 · 15 comments · May be fixed by #2678

Comments

@kalinni
Copy link

kalinni commented Feb 28, 2024

First of all thanks for working on and maintaining OpenBabel!

I want to add hydrogens to my pdb file. If I use obabel 3lcs.pdb -O 3lcs_prot.pdb -h everything looks great in general but the results don't match my expectations - for example one of the oxygens in the side chain of glutamic acid (e.g. GLU 1197) gets a hydrogen. In physiological conditions (from my understanding) it doesn't have a hydrogen there and is charged.

So I moved on to use -p instead of -h to get hydrogens at pH 7.4. Now from what I can see the hydrogens are added following my expectation, but the pdb file is messed up.

(I am using OpenBabel within a Python project, but as the behavior is the same from the command line I used the command line options for easier reproducibility.)

Am I missing a preprocessing step or misunderstanding how these options should be used?

Thanks for your time and help!

Environment Information

Open Babel version: 3.1.1
Operating system and version: Windows 10

Expected Behavior

running obabel 3lcs.pdb -O 3lcs_prot.pdb -p should give me my pdb file with hydrogens added but residue numbers and names unchanged.

Actual Behavior

running obabel 3lcs.pdb -O 3lcs_prot.pdb -p changes residue numbers to start at 1 and changes the residue names for everything it doesn't recognize as a standard amino acid or has some other issues with to UNK or UNL.
(From what I see alternative locations and missing atoms seem to cause issues and any ligands in the file also lose their name.)

Steps to Reproduce

download the pdb file and run the commands (also happens with other pdb files though)

Copy link

welcome bot commented Feb 28, 2024

Thanks for opening your first issue here! Be sure to follow the issue template!

@nbehrnd
Copy link
Contributor

nbehrnd commented Feb 28, 2024

@kalinni In an instance of Linux Debian 13/trixie with openbabel 3.1.1., I run the addition of hydrogens by

$ obabel 3lcs.pdb -h -O obabel_with_hydrogens.pdb
==============================
*** Open Babel Warning  in PerceiveBondOrders
  Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders (title is 3lcs.pdb)

1 molecule converted

Based on this, perhaps some ligands, or chains are a bit deformed to "fit" into Open Babel's patterns to recognize them well. This deformation could be by mere distance, or/and geometry. I then run grep, but didn't find any UNL, nor UNK

$ grep UNL ./obabel_with_hydrogens.pdb  -c
0
$ grep UNK ./obabel_with_hydrogens.pdb  -c
0

An alternative approach could be to use Jmol, to load the initial .pdb, and (File -> Console) to enter the following commands there (confirmed by enter):

calculate hydrogens;
write "Jmol_with_hydrogens.pdb";

for a new .pdb file. This equally didn't have a UNK or UNL

$ grep UNK Jmol_with_hydrogens.pdb -c
0
$ grep UNL Jmol_with_hydrogens.pdb -c
0

but a different processing program might provide a .pdb different enough to continue the work. (Contrasting to the input, lattice parameters are missing, though.)

Jmol_with_hydrogens.pdb.txt

@kalinni
Copy link
Author

kalinni commented Feb 28, 2024

Thank you so much for the quick response! The issue occurs with -p though, with -h the file is fine but the hydrogens are not what I expect.

@kalinni
Copy link
Author

kalinni commented Feb 29, 2024

To follow up a little - from what I understand, in both cases (using -h and using -p) eventually AddNewHydrogens(HydrogenType whichHydrogen, bool correctForPH, double pH) defined in mol.cpp is called.

While with -h the option correctForPH is set to false, with -p it is set to true, triggering the call of CorrectForPH(pH). I assume this is where the residue names and numbers then get messed up.

As mentioned, my actual use case is within a Python tool (PLIP), so I am looking for a solution that does not require additional programs, as I would like to avoid adding further dependencies.

@kalinni
Copy link
Author

kalinni commented Feb 29, 2024

I noticed I am getting the following warning when protonating with the -p flag:

>obabel 3lcs.pdb -O 3lcs_prot.pdb -p
==============================
*** Open Babel Warning  in OpenBabel::OBGlobalDataBase::Init
  Unable to open data file 'space-groups.txt'
==============================
*** Open Babel Warning  in OpenBabel::OBGlobalDataBase::Init
  Cannot initialize database 'space-groups.txt' which may cause further errors.
1 molecule converted

Maybe that could be related? Not sure what's wrong there, the file is present in the same directory as the phmodel.txt and definitely not missing.

@nbehrnd
Copy link
Contributor

nbehrnd commented Mar 1, 2024

In absence of working knowledge of C++, I lack insight how Openbabel's C++ code is organized, interacting, and eventually providing results.

Your log shared by 2029-02-29 might indicate "something is broken" in the setup of Open Babel accessible to you, because issuing the same command as you yields in my instance (OpenBabel 3.1.1. as provided by Debian) only the following:

$ obabel 3lcs.pdb -O 3lcs_prot.pdb -p
==============================
*** Open Babel Warning  in PerceiveBondOrders
  Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders (title is 3lcs.pdb)

1 molecule converted

Speculation: the PLIB tool's file requirements.txt lists Open Babel (line 4, openbabel~=3.0.0), which might interfere here. It might be a library missing, or a conflict of "which one to use" e.g., if you have a system wide installation of OpenBabel, then partially overwritten by PLIB. My recommendation consists of

  • to remove the current (local?) installation of Open Babel.
  • if you reinstall Open Babel for general use on the computer (e.g., because of obgui), use the most recent version (original source)
  • if you want to want to use Open Babel inside a Python project, use a virtual environment; isolated from the rest of Python libraries you have, an error in dependencies of the libraries were constrained to this compartment
  • edit this (or any other) requirements.txt file. The current fetch of openbabel's Python bindings via pypi is known to be problematic; perhaps because it lists version 3.1.1.1 (four levels) on its PyPi entry instead of typical three (e.g., 3.1.1). Anyway, substitute the line of openbabel by openbabel-wheel>=3.1.1.16 (as an example of application, see here) before running the usual pip install -r ./requirements.txt to amend the local Python setup. openbabel-wheel aims to offer a temporary bypass; however, it requires the absence of any "normal" openbabel installation (within the virtual environement) to prevent confusion "which version of library x to use". It works reasonably well both in Windows, and Linux. Most importantly: the edit required is only in file requirements.txt -- no change of the Python scripts called hereafter. Else, there equally is an Open Babel entry for Miniconda if this ecosystem more suitable for your work/more familiar to you, than normal cpython.

@kalinni
Copy link
Author

kalinni commented Mar 1, 2024

Hi @nbehrnd, thanks so much for trying to figure this out with me. I'm also limited by my rather limited C++knowledge. 😅

I already adapted the requirements file (on my machine PLIP is using OpenBabel 3.1.1 and there is only the one OpenBabel installation), fought my battle with the OpenBabel Python bindings 3.1.1.1 issue and have PLIP and OpenBabel running in a virtual environment.
I might try a reinstallation though.

When you run the command with -p in your setup - are you getting UNL or UNK records?
Earlier you said you are not getting them with -h but that option doesn't produce them for me either.

Thanks and have a lovely weekend!

@nbehrnd
Copy link
Contributor

nbehrnd commented Mar 2, 2024

@kalinni With -p I get records labeled either UNL or UNK:

$ obabel 3lcs.pdb -O 3lcs_prot.pdb -p
==============================
*** Open Babel Warning  in PerceiveBondOrders
  Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders (title is 3lcs.pdb)

1 molecule converted
$ grep UNK 3lcs_prot.pdb -c
457
$ grep UNL 3lcs_prot.pdb -c
1500

So I resort to AWK to extract them separately (attached below), i.e.

$ awk '{if ($0 ~ /UNK/) print}' 3lcs_prot.pdb  > UNK.pdb
$ awk '{if ($0 ~ /UNL/) print}' 3lcs_prot.pdb  > UNL.pdb

but their individual display e.g., in Jmol does not look pretty. For a protein structure I anticipate a couple of unbounded / unlinked molecules of water, perhaps one/a few small ligands; but there are bit too many atoms/too large ensembles to feel comfortable here.

In addition, (UNK_detail), there are a couple of motifs in the structure with added hydrogens which merit a check. E.g., the cyclopropane -- not that it is impossible, but for its ring strain less likely to be seen in a naturally occurring compound. The same frame equally features a carbon exceeding tetravalence .and. hydrogens seemingly sharing the same position; which is not good, chemically speaking. The small molecules the report mentions as independent from the protein are not clearly visible (yet).

So I thought one could split the original file (not yet protonated) into molecules Open Babel recognizes as separate:

$ obabel 3lcs.pdb --separate  -O fragment.pdb -m
==============================
*** Open Babel Warning  in PerceiveBondOrders
  Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders (title is 3lcs.pdb)

142 molecules converted
142 files output. The first is fragment1.pdb

There are some which are large:

$ wc -l fragment*.pdb | sort -k 1 -rn | head
 11135 total
  5517 fragment.pdb
  1422 fragment1.pdb
   894 fragment19.pdb
   832 fragment9.pdb
   502 fragment17.pdb
   468 fragment7.pdb
   274 fragment5.pdb
   224 fragment11.pdb
   200 fragment13.pdb

-- fragment13.pdb includes the/a cyclopropane -- yet quite a number which are small (water), too:

$ wc -l fragment*.pdb | sort -k 1 -n | grep "5 frag" -c
122
$ wc -l fragment*.pdb | sort -k 1 -n | head
     5 fragment100.pdb
     5 fragment101.pdb
     5 fragment102.pdb
     5 fragment103.pdb
     5 fragment104.pdb
     5 fragment105.pdb
     5 fragment106.pdb
     5 fragment107.pdb
     5 fragment108.pdb
     5 fragment109.pdb

There should be a better way to address the hydrogenation.

2024-03-02_check_run.zip

@nbehrnd
Copy link
Contributor

nbehrnd commented Mar 2, 2024 via email

@kalinni
Copy link
Author

kalinni commented Mar 4, 2024

Hi @nbehrnd, thanks for taking such a deep dive into this. Actually I am working specifically on PLIP and 3lcs.pdb is only an example file I am currently using for testing.

I am looking into some issues regarding protonation - PLIP currently calls AddPolarHydrogens().
I thought this was the Python Interface equivalent to the command line version with -h, actually it is probably the equivalent of --addpolarh. So yes, I am indeed only interested in polar hydrogens.

However, I am getting a hydrogen atom added to one of the oxygens in glutamic acids side chain which I don't think should be there in physiological conditions.

I was experimenting with using AddHydrogens(True, True, 7.4) instead to get polar hydrogens for a specific pH value. I thought this would be the Python interface equivalent of -p. Turns out, while using it via the Python interface messes up the file in a similar manner, the command line option in contrast to the Python interface option still leaves me with that unwanted hydrogen in glutamic acid's side chain. No idea how that happens...

Regarding that cyclopropane - it is a methionine in the original pdb file, I guess open babel messes it up even more than I first noticed.

@kalinni kalinni changed the title Adding hydrogens with -p messes up residue numbers and names Adding hydrogens with -p messes up the entire structure (not only residue numbers and names) Mar 4, 2024
@fredrikw fredrikw linked a pull request Mar 5, 2024 that will close this issue
@fredrikw
Copy link
Contributor

fredrikw commented Mar 5, 2024

I think the problem is that the function for correcting for pH reperceives the chains and residues. I made a PR that will change this, if you are comfortable with building from source you could try that one out and see if it helps.

@kalinni
Copy link
Author

kalinni commented Mar 5, 2024

Awesome! Sounds like this could easily explain the results I am getting. So far I have not been building OpenBabel from source, but I'll definitely give it a go in the upcoming days and let you know here how it went. Thanks!

@nbehrnd
Copy link
Contributor

nbehrnd commented Mar 5, 2024

@kalinni The page https://open-babel.readthedocs.io/en/latest/Installation/install.html compiles (...) some hints. I agree it merits to pull a chair to digest the steps required.

@kalinni
Copy link
Author

kalinni commented Mar 11, 2024

Hi all,
@fredrikw PR does indeed fix the issue! 🎉
Once his changes are merged and part of an OpenBabel release, this issue can be closed.
Thanks a bunch!

@nbehrnd
Copy link
Contributor

nbehrnd commented Mar 15, 2024

@kalinni A postscript: in my pane of «suggested projects», I just encountered a link to reduce by the Richardson group. It might be an interesting checkpoint for your reduction/addition of hydrogen atoms to .pdb for the options and report back to the CLI the program provides (including warnings of possible bumps). Running this once without one of the optional flags (cf. archive below) in the general pattern of

$ reduce input.pdb > output.pdb

in an instance of Linux Debian to yield a reduced.pdb, there was no occurrence of UNK or UNL to spot for grep in the result.

There is one instance of two carbon atoms pretty close / partially overlapping each other (cf. detail.gif enclosed), but possibly already present in the original dataset.

a_partager.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants