Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fewer charges than atoms #8

Closed
mbatgh opened this issue Jan 6, 2023 · 6 comments · May be fixed by #14
Closed

fewer charges than atoms #8

mbatgh opened this issue Jan 6, 2023 · 6 comments · May be fixed by #14

Comments

@mbatgh
Copy link

mbatgh commented Jan 6, 2023

Hi,

Thanks for this very useful tool!

I installed with pip, and the option 0 example you include on the github webpiage seems to work.
however, when i try larger molecules I keep getting charge arrays that have fewer entries
than my molecules have atoms

e.g.:

from rdkit import Chem; from espaloma_charge import charge
molecule = Chem.MolFromSmiles("CCOCCNC(=O)C[C@@h]1N=C(c2ccc(cc2)Cl)c2c(-n3c1nnc3C)sc(c2C)C")
charge(molecule)
/home/celeristx/miniconda3/lib/python3.10/site-packages/dgl/heterograph.py:72: DGLWarning: Recommend creating graphs by dgl.graph(data) instead of dgl.DGLGraph(data).
dgl_warning('Recommend creating graphs by dgl.graph(data)'
array([-0.18779773, 0.4208883 , -0.44761604, 0.419579 , 0.12307103,
-0.29645967, 0.51088446, -0.41038862, -0.13668227, 0.28527066,
-0.58568656, 0.5189817 , -0.11827499, 0.07983082, -0.0337906 ,
0.0861465 , -0.03379034, 0.07983054, -0.08033632, -0.18743275,
0.3917905 , -0.49400392, 0.46795568, -0.27843612, -0.21633013,
0.47469214, -0.29152048, -0.13895345, 0.03585355, -0.07050467,
0.01740551, 0.09582561], dtype=float32)

the molecule has 58 atoms, while above only 32 numbers are provided.
Turns out, if I only count non-hydrogens i get 32 atoms, it looks as if the charge
command only tells me the charges of the nonH atoms, or fails to infer the implicit
hydrogens in the smiles string.

also, even if this worked, I'd struggle to assign the charges. Using the tool as seen above
it is not clear which charge belongs to which atom.

I also tried to directly read a local sdf/mol file incuding explicit hydrogens with some rdkit reader but the result is the
same (charge command only gives an array with 32 entries while the molecule in the mol file has 58 atoms.

from rdkit import Chem; from espaloma_charge import charge
mol=Chem.rdmolfiles.MolFromMolFile('small.mol')
charge(mol)
/home/celeristx/miniconda3/lib/python3.10/site-packages/dgl/heterograph.py:72: DGLWarning: Recommend creating graphs by dgl.graph(data) instead of dgl.DGLGraph(data).
dgl_warning('Recommend creating graphs by dgl.graph(data)'
array([-0.14054929, 0.40022638, -0.44654357, 0.4333703 , 0.16091529,
-0.35090274, 0.61365485, -0.15334736, -0.49362743, 0.2768011 ,
-0.59814954, 0.4693746 , 0.5192658 , -0.26549223, -0.54531276,
-0.1246964 , -0.24202514, -0.29489523, 0.414872 , 0.531917 ,
0.07317285, 0.07317285, -0.09172964, -0.19458656, -0.21533589,
-0.04015996, -0.0401599 , 0.1066979 , 0.15635435, 0.07988627,
0.01111788, -0.0832857 ], dtype=float32)

As rdkit would not write mol2 files, I also see no way to directly write the result as a file to disk
for further use after this procedure.

Finally the command line tool you mention in Option 2 (espaloma_charge) is nowhere to
be found in the repository.

Any help very much appreciated!
cheers,
Michael

@mbatgh
Copy link
Author

mbatgh commented Jan 10, 2023

Just upgrade the package to 0.0.8.

Now a modified version of option 0 of the examples shown on the github page works. the other two options still don't
work for me, although option 2 could probably work with minor changes to the script (see below)

option 0

works for the given example, but for any molecules with hydrogens, instead of directly using a smiles string,
I need to load a molecule from a file on disk that includes hydrogens, and tell rdkit explicitly to keep the
hydrogens, as in the example below with ethanol:

python3
Python 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:23:14) [GCC 10.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

from rdkit import Chem; from espaloma_charge import charge
m = Chem.MolFromMolFile('e.mol',removeHs=False)
print(Chem.MolToMolBlock(m))
q=charge(m)
print(q)
[-0.1135485 0.1239773 -0.60537666 0.04428618 0.04428618 0.04428618
0.03098544 0.03098544 0.4001184 ]

option 1

python3
Python 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:23:14) [GCC 10.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

from openff.toolkit.topology import Molecule
from espaloma_charge.openff_wrapper import EspalomaChargeToolkitWrapper
toolkit_registry = EspalomaChargeToolkitWrapper()
molecule = Molecule.from_smiles("N#N")
molecule.assign_partial_charges('espaloma-am1bcc', toolkit_registry=toolkit_registry)
~/miniconda3/lib/python3.10/site-packages/dgl/heterograph.py:72: DGLWarning: Recommend creating graphs by dgl.graph(data) instead of dgl.DGLGraph(data).
dgl_warning('Recommend creating graphs by dgl.graph(data)'
Traceback (most recent call last):
File "", line 1, in
File "/home/celeristx/miniconda3/lib/python3.10/site-packages/openff/toolkit/topology/molecule.py", line 3408, in assign_partial_charges
toolkit.assign_partial_charges(
File "/home/celeristx/miniconda3/lib/python3.10/site-packages/espaloma_charge/openff_wrapper.py", line 128, in assign_partial_charges
molecule.partial_charges = unit.Quantity(
File "/home/celeristx/miniconda3/lib/python3.10/site-packages/openff/toolkit/topology/molecule.py", line 3982, in partial_charges
assert hasattr(charges, "unit")
AssertionError

option 2

seems to work at first, but if I use any molecule that contains hydrogens the corresponding charges
are not provided. Below e.mol2 contains ethanol including explicit hydrogens, and I get only 3 numbers.
It might be a good idea to modify the function charge_mol2 in antechamber_utils.py to keep hydrogens,
which, by default, rdkit removes.

prompt> espaloma_charge -i e.mol2 -o e.crg
~/miniconda3/lib/python3.10/site-packages/dgl/heterograph.py:72: DGLWarning: Recommend creating graphs by dgl.graph(data) instead of dgl.DGLGraph(data).
dgl_warning('Recommend creating graphs by dgl.graph(data)'

prompt> cat e.crg
-0.030346 0.399421 -0.369075

@yuanqing-wang
Copy link
Member

Hi @mbatgh ,

Many thanks for testing this! I do think there is a bug in the option 2 with RDKit removing hydrogens. I'll fix this shortly.

For option 1, however, I wasn't able to reproduce the error. Could you show me your openff.toolkit version?

Thanks!

@yuanqing-wang
Copy link
Member

I'll fix the hydrogen-removing behavior shortly!

@yuanqing-wang
Copy link
Member

I think this should fix the hydrogen-removing behavior: #14

@mbatgh
Copy link
Author

mbatgh commented Jan 18, 2023

yuanqing-wang wrote:
"For option 1, however, I wasn't able to reproduce the error. Could you show me your openff.toolkit version?"

here we go...
conda list | grep openff
openff-forcefields 2.0.0 pyh6c4a22f_0 conda-forge
openff-toolkit 0.10.6 pyhd8ed1ab_0 conda-forge
openff-toolkit-base 0.10.6 pyhd8ed1ab_0 conda-forge
openff-units 0.2.0 pyh1a96a4e_0 conda-forge
openff-utilities 0.1.7 pyh1a96a4e_0 conda-forge

if this turns out being a compatibility issues that can be solved with a different version of openff, please do let me know!

thanks!

@mbatgh
Copy link
Author

mbatgh commented Jan 19, 2023

just installed openff-toolkit version 0.12.0 ... now option 1 works!
thanks again for this useful tool!

@mbatgh mbatgh closed this as completed Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants