New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct identification of atom types of ionized molecules #2691
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
@husakm Can you share a specific example? The assignment of bonds in a .cif file is not mandatory (let alone bond order), and accounting for molecules on special positions (where symmetry operators of the unit cell [most frequently a centre of inversion, a mirror plane, a proper axis of rotation]) allow a crystallographic motif to be smaller than the whole molecule described can be difficult. With varying success to recover a molecule as such from a .cif file, you could resort to GUIs established in the field e.g., CCDC's Mercury (already in the community edition), Jmol, Avogadro. If you have a stack of files to process, an automated conversion into a .mol/.sdf file (or/then further processed to yield an .xyz) could be an option, too. Examples include |
Sure, I will post a salt CIF file demonstrating the problem. |
To make the problem simple and clearly visible, I had separated the issue form CIF file usage and I will demonstrate it on .XYZ file. *** Open Babel Warning in OpenBabel::OBMol::PerceiveBondOrders The generated .bgf file contains incorrect Dreading ff atom type assignments: After removing the hydrogen on the N atom, all works OK (can be tested by following file): I woul like to be able to specify the fragment is ionized or let OB to identify this. |
The problem is clearly demonstrated on the image in attachment. The Dreading atom types on the right part of the image are correctly assigned by a commercial engine (Materials Studio) witch handle the fragment correctly automatically. The left part of the image shows OB results. OB was not able to identify the resonant bonds on the ionized aromatic cycle = atom types assignment is incorrect. I believe not only Dreading but the other force-fields atom type assignment will be incorrect as well. |
@husakm A bypass perhaps good enough in terms of maintenance and overall performance could consist of generating an intermediate .sdf by xyz2mol eventually processed by obabel to yield the Materials Studio file you seek. Meanwhile, someone with the necessary insight then could extend obabel's source code to provide the extended functionality you request. Addressing RDKit with Python, a MWE can be from rdkit import Chem
from rdkit.Chem import rdDetermineBonds
raw_mol = Chem.MolFromXYZFile('KAXWOV_cation_xyz.xyz')
mol = Chem.Mol(raw_mol)
rdDetermineBonds.DetermineBonds(mol,charge=-1)
print(Chem.MolToMolBlock(mol)) For some flexibility regarding the .xyz file to process and the (global) charge to assign, I wrote a small script around this (xyz2mol_b) -- as just tested in a virtual environment (Python 3.11.8, RDKit 2023.09.6 fetched with pip) still functional today. Thus $ python ./xyz2mol_b.py KAXWOV_cation_xyz.xyz -c 1 > cation.sdf
$ obabel cation.sdf -O cation.bgf
1 molecule converted
$ cat cation.bgf
BIOGRF 200
DESCRP
FORCEFIELD DREIDING
FORMAT ATOM (a6,1x,i5,1x,a5,1x,a3,1x,a1,1x,a5,3f10.5,1x,a5,i3,i2,1x,f8.5)
HETATM 1 N1 RES A 444 -2.09800 4.11920 11.28120 N_R 2 0 -0.21693
HETATM 2 C2 RES A 444 -2.23690 5.32490 10.71550 C_R 3 0 0.17487
HETATM 3 C3 RES A 444 -2.80460 6.36740 11.41490 C_R 3 0 0.04130
HETATM 4 C4 RES A 444 -2.97580 7.71300 10.76640 C_3 4 0 0.07478
HETATM 5 O5 RES A 444 -2.56850 7.64730 9.44780 O_3 2 0 -0.39022
HETATM 6 C6 RES A 444 -3.22420 6.11110 12.71340 C_R 3 0 -0.04518
HETATM 7 C7 RES A 444 -3.07610 4.86090 13.27370 C_R 3 0 0.00239
HETATM 8 C8 RES A 444 -2.50840 3.86280 12.52490 C_R 3 0 0.16949
HETATM 9 H9 RES A 444 -1.69690 3.45390 10.69820 H_ 1 0 0.45224
HETATM 10 H10 RES A 444 -2.69970 8.47580 9.02660 H_HB 1 0 0.20985
HETATM 11 H11 RES A 444 -1.89590 5.48170 9.70190 H_ 1 0 0.13792
HETATM 12 H12 RES A 444 -4.01710 7.99900 10.80510 H_ 1 0 0.06091
HETATM 13 H13 RES A 444 -2.38340 8.44830 11.29190 H_ 1 0 0.06091
HETATM 14 H14 RES A 444 -3.67620 6.90350 13.29250 H_ 1 0 0.06248
HETATM 15 H15 RES A 444 -3.39390 4.67230 14.28870 H_ 1 0 0.06767
HETATM 16 H16 RES A 444 -2.39890 2.87120 12.93810 H_ 1 0 0.13754
FORMAT CONECT (a6,12i6)
CONECT 1 2 8 9
ORDER 1 2 1 1
CONECT 2 1 3 11
ORDER 2 2 1 1
CONECT 3 2 4 6
ORDER 3 1 1 2
CONECT 4 3 5 12 13
ORDER 4 1 1 1 1
CONECT 5 4 10
ORDER 5 1 1
CONECT 6 3 7 14
ORDER 6 2 1 1
CONECT 7 6 8 15
ORDER 7 1 2 1
CONECT 8 7 1 16
ORDER 8 2 1 1
CONECT 9 1
ORDER 9 1
CONECT 10 5
ORDER 10 1
CONECT 11 2
ORDER 11 1
CONECT 12 4
ORDER 12 1
CONECT 13 4
ORDER 13 1
CONECT 14 6
ORDER 14 1
CONECT 15 7
ORDER 15 1
CONECT 16 8
ORDER 16 1
END The intermediate .sdf carries three double bonds, the visualization in Jmol now carries the cyclic array of \pi electrons: (While reading the .sdf, the computation of bonds by Jmol was disabled.) The labeling of |
So you suggest: |
One more comment: |
Point a) I would describe this task as "extend what obabel already can do" Point c) can one the tools detect if the .xyz is ionic? No, by now I do not think so. The method $ python ./xyz2mol_b.py KAXWOV_cation_xyz.xyz -c -1 > negative.sdf
$ obabel negative.sdf -O negative.bgf
1 molecule converted The earlier Python script above however can be edited; molecules which (overall) don't carry a charge are rewritten as .sdf obabel can process. Structure files which might carry a charge in excess will be left untouched and there will be a note about them back to the CLI. The prototype script below allows both the sequential / use-1-CPU-only mode as well as one to spread the work of xyz2mol all CPU available. #!/usr/bin/env python3
"""
name : filter.py
author : nbehrnd@yahoo.com
license: GPL2
date : 2024-05-03
edit :
purpose: perform xyz2mol on over all neutral molecules, inform abou the other
"""
import argparse
from multiprocessing import Pool # engage multiple CPU (Python standard library)
import rdkit
from rdkit import Chem
from rdkit.Chem import rdDetermineBonds
def get_args():
"""Get command-line arguments"""
parser = argparse.ArgumentParser(
description="""
One, or multiple .xyz files can be processed to yield a individual .sdf files.
A molecule carrying an overall charge will not yield a .sdf; instead, the name
of the .xyz file to check again will be reported to the CLI.""",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
)
parser.add_argument(
"file",
metavar="FILE",
type=argparse.FileType("rt"),
nargs="+",
help="""
Provide one, or multiple .xyz files to process (`*.xyz` is possible, too).""",
)
parser.add_argument(
"-p", "--parallel", help="process with all available CPU", action="store_true"
)
return parser.parse_args()
def apply_xyz2mol(input_xyz):
"""convert .xyz into a .sdf block if molecule is over all charge balanced"""
raw_mol = Chem.MolFromXYZFile(input_xyz)
mol = Chem.Mol(raw_mol)
try:
rdkit.Chem.rdDetermineBonds.DetermineBonds(mol, charge=0)
mol.SetProp("_Name", input_xyz) # add an internal tag to .sdf block
result = Chem.MolToMolBlock(mol)
output_file = "".join([input_xyz, ".sdf"])
with open(output_file, mode="w", encoding="utf-8") as new:
new.write(result)
except ValueError:
print(f"{input_xyz} possibly is not charge balanced. Check again.")
def main():
"""Join the functionalities"""
args = get_args()
if args.parallel:
# distribute the lis of work on all CPU available, start:
work_ahead = []
for file in args.file:
work_ahead.append(file.name)
pool = Pool()
pool.map(apply_xyz2mol, work_ahead)
pool.close()
pool.join()
else:
for file in args.file:
apply_xyz2mol(file.name)
if __name__ == "__main__":
main() |
Is your feature request related to a problem? Please describe.
Open Babel is not able to assign correct atom types (eg. for force field usage) of ionized molecules. This is typical for molecules in crystals (cif). Example is eg. protonated N atom in aromatic cycle. OB reports Can not Kekulize in such situation.
Describe the solution you'd like
Identify the fragment is ion. Find what atom is ionic. Remove/add hydrogen to make atom type assigment functional.
Describe alternatives you've considered
Implement ionized atom type ?
Give me hint how to implement this myself ?
Additional context
Without this functionality processing of most crystal structures of organic molecules is impossible. It can be useful to export in some way atom types externaly, becous OB can not minimize energy in periodic crystal potencial. We want to use external code (GULP) for energy minimization. But we need automatic atom type assigment (Drading, UFF, GAFFF e.t.c.) ...
The text was updated successfully, but these errors were encountered: