Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct identification of atom types of ionized molecules #2691

Open
husakm opened this issue Apr 26, 2024 · 9 comments
Open

Correct identification of atom types of ionized molecules #2691

husakm opened this issue Apr 26, 2024 · 9 comments

Comments

@husakm
Copy link

husakm commented Apr 26, 2024

Is your feature request related to a problem? Please describe.
Open Babel is not able to assign correct atom types (eg. for force field usage) of ionized molecules. This is typical for molecules in crystals (cif). Example is eg. protonated N atom in aromatic cycle. OB reports Can not Kekulize in such situation.

Describe the solution you'd like
Identify the fragment is ion. Find what atom is ionic. Remove/add hydrogen to make atom type assigment functional.

Describe alternatives you've considered
Implement ionized atom type ?
Give me hint how to implement this myself ?

Additional context
Without this functionality processing of most crystal structures of organic molecules is impossible. It can be useful to export in some way atom types externaly, becous OB can not minimize energy in periodic crystal potencial. We want to use external code (GULP) for energy minimization. But we need automatic atom type assigment (Drading, UFF, GAFFF e.t.c.) ...

Copy link

welcome bot commented Apr 26, 2024

Thanks for opening your first issue here! Be sure to follow the issue template!

@nbehrnd
Copy link
Contributor

nbehrnd commented Apr 30, 2024

@husakm Can you share a specific example? The assignment of bonds in a .cif file is not mandatory (let alone bond order), and accounting for molecules on special positions (where symmetry operators of the unit cell [most frequently a centre of inversion, a mirror plane, a proper axis of rotation]) allow a crystallographic motif to be smaller than the whole molecule described can be difficult.

With varying success to recover a molecule as such from a .cif file, you could resort to GUIs established in the field e.g., CCDC's Mercury (already in the community edition), Jmol, Avogadro. If you have a stack of files to process, an automated conversion into a .mol/.sdf file (or/then further processed to yield an .xyz) could be an option, too. Examples include codcif2sdf of the cod-tools, a Jmol script, or Jensen's xyz2mol extension to RDKit. As mentioned by Jensen himself (talk at the RDKit user meeting 2020, link to a youtube recording), organometallic structures are more difficult in assigning bonds and bond order.

@husakm
Copy link
Author

husakm commented Apr 30, 2024

Sure, I will post a salt CIF file demonstrating the problem.
Give me 2 days, I am out of office ...,
Generaly the issue is not CIF specific, the Kekulization will probably not work for any aromatic cycle with 1+ charge and protonated aromatic nitrogen.
We need OpenBabel primary as a tool for force field atom type assigment. We are crystallographers = we already implemented our CIF parser + space group handling engine on our side, so we can send to OB any format and molecules already fully completed by symetry operations + with special positions handled on our side. Vincent Favre-Nicolin who hed implemented OB CIF parser was working with me in the past ...
We only need correct handling of charged fragments in OB and posibility to export force field atom types detected (or enough output information so we can assign force fields atom types on our side).

@husakm
Copy link
Author

husakm commented May 2, 2024

To make the problem simple and clearly visible, I had separated the issue form CIF file usage and I will demonstrate it on .XYZ file.
KAXWOV_cation_xyz.txt
The file in attachment describes a cation separated from a CIF file.
I had proceed it by following command:
obabel.exe KAXWOV_cation.xyz -obgf -OKAXWOV_cation.bgf
The code had reported following issues:

*** Open Babel Warning in OpenBabel::OBMol::PerceiveBondOrders
Failed to kekulize aromatic bonds in OBMol::PerceiveBondOrders (title is KAXWO
V)

The generated .bgf file contains incorrect Dreading ff atom type assignments:
N_2 instead of N_R
C_2 instead of C_R
C_3 instead of C_R

After removing the hydrogen on the N atom, all works OK (can be tested by following file):
KAXWOV_neutral_xyz.txt

I woul like to be able to specify the fragment is ionized or let OB to identify this.
And correctly identify the resonant bonds in the ionized fragments.

@husakm
Copy link
Author

husakm commented May 2, 2024

The problem is clearly demonstrated on the image in attachment. The Dreading atom types on the right part of the image are correctly assigned by a commercial engine (Materials Studio) witch handle the fragment correctly automatically. The left part of the image shows OB results. OB was not able to identify the resonant bonds on the ionized aromatic cycle = atom types assignment is incorrect. I believe not only Dreading but the other force-fields atom type assignment will be incorrect as well.

KAXWOV_inocrrect-atom_types

@nbehrnd
Copy link
Contributor

nbehrnd commented May 2, 2024

@husakm A bypass perhaps good enough in terms of maintenance and overall performance could consist of generating an intermediate .sdf by xyz2mol eventually processed by obabel to yield the Materials Studio file you seek. Meanwhile, someone with the necessary insight then could extend obabel's source code to provide the extended functionality you request.

Addressing RDKit with Python, a MWE can be

from rdkit import Chem
from rdkit.Chem import rdDetermineBonds

raw_mol = Chem.MolFromXYZFile('KAXWOV_cation_xyz.xyz')
mol = Chem.Mol(raw_mol)
rdDetermineBonds.DetermineBonds(mol,charge=-1)
print(Chem.MolToMolBlock(mol))

For some flexibility regarding the .xyz file to process and the (global) charge to assign, I wrote a small script around this (xyz2mol_b) -- as just tested in a virtual environment (Python 3.11.8, RDKit 2023.09.6 fetched with pip) still functional today. Thus

$ python ./xyz2mol_b.py KAXWOV_cation_xyz.xyz -c 1 > cation.sdf
$ obabel cation.sdf -O cation.bgf
1 molecule converted
$ cat cation.bgf 
BIOGRF 200
DESCRP 
FORCEFIELD DREIDING  
FORMAT ATOM   (a6,1x,i5,1x,a5,1x,a3,1x,a1,1x,a5,3f10.5,1x,a5,i3,i2,1x,f8.5)
HETATM     1 N1    RES A   444  -2.09800   4.11920  11.28120 N_R    2 0 -0.21693
HETATM     2 C2    RES A   444  -2.23690   5.32490  10.71550 C_R    3 0  0.17487
HETATM     3 C3    RES A   444  -2.80460   6.36740  11.41490 C_R    3 0  0.04130
HETATM     4 C4    RES A   444  -2.97580   7.71300  10.76640 C_3    4 0  0.07478
HETATM     5 O5    RES A   444  -2.56850   7.64730   9.44780 O_3    2 0 -0.39022
HETATM     6 C6    RES A   444  -3.22420   6.11110  12.71340 C_R    3 0 -0.04518
HETATM     7 C7    RES A   444  -3.07610   4.86090  13.27370 C_R    3 0  0.00239
HETATM     8 C8    RES A   444  -2.50840   3.86280  12.52490 C_R    3 0  0.16949
HETATM     9 H9    RES A   444  -1.69690   3.45390  10.69820 H_     1 0  0.45224
HETATM    10 H10   RES A   444  -2.69970   8.47580   9.02660 H_HB   1 0  0.20985
HETATM    11 H11   RES A   444  -1.89590   5.48170   9.70190 H_     1 0  0.13792
HETATM    12 H12   RES A   444  -4.01710   7.99900  10.80510 H_     1 0  0.06091
HETATM    13 H13   RES A   444  -2.38340   8.44830  11.29190 H_     1 0  0.06091
HETATM    14 H14   RES A   444  -3.67620   6.90350  13.29250 H_     1 0  0.06248
HETATM    15 H15   RES A   444  -3.39390   4.67230  14.28870 H_     1 0  0.06767
HETATM    16 H16   RES A   444  -2.39890   2.87120  12.93810 H_     1 0  0.13754
FORMAT CONECT (a6,12i6)

CONECT     1     2     8     9
ORDER      1     2     1     1
CONECT     2     1     3    11
ORDER      2     2     1     1
CONECT     3     2     4     6
ORDER      3     1     1     2
CONECT     4     3     5    12    13
ORDER      4     1     1     1     1
CONECT     5     4    10
ORDER      5     1     1
CONECT     6     3     7    14
ORDER      6     2     1     1
CONECT     7     6     8    15
ORDER      7     1     2     1
CONECT     8     7     1    16
ORDER      8     2     1     1
CONECT     9     1
ORDER      9     1
CONECT    10     5
ORDER     10     1
CONECT    11     2
ORDER     11     1
CONECT    12     4
ORDER     12     1
CONECT    13     4
ORDER     13     1
CONECT    14     6
ORDER     14     1
CONECT    15     7
ORDER     15     1
CONECT    16     8
ORDER     16     1
END

The intermediate .sdf carries three double bonds, the visualization in Jmol now carries the cyclic array of \pi electrons:

cation_jmol

(While reading the .sdf, the computation of bonds by Jmol was disabled.)

The labeling of N1 as N_R in the .bgf file now matches your requirement, too. However, does the modification improve the file over all? For ease of replication, the files used/obtained in this limited test are attached in the .zip archive below.

2024-05-02_test_case_cation.zip

@husakm
Copy link
Author

husakm commented May 2, 2024

So you suggest:
A) Correct the OpenBabel engine (implement charged fragments support on our side)
or
B) Switch from OpenBabel to RDkit, witch alredy have the required functionality ?
Generaly the key is bond type asigment. We can encode the ff atom types assigmend based on molecule topologic information on our side ...

@husakm
Copy link
Author

husakm commented May 2, 2024

One more comment:
Can any of the suggested tools detect, the fragment is iont ?
I need to make the whole procedure automatic ...

@nbehrnd
Copy link
Contributor

nbehrnd commented May 3, 2024

Point a) I would describe this task as "extend what obabel already can do"
Point b) I'm not aware if RDKit offers an interface to Materials Studio / .bgf files. The suggest to use multiple tools (the .sdf reconstruction with xyz2mol/RDKit, the preparation of .bgf with obabel) might be unfortunate for splitting the attention on two tools instead of one for the overall .xyz -> .bgf task ahead, but I presume the preparation of .bgf has to run only once and hence "acceptable".

Point c) can one the tools detect if the .xyz is ionic? No, by now I do not think so.

The method xyz2mol can be ambiguous because the charge parameter i) is the overall charge, i.e. an amino acid with simultaneous presence of ammonium and carboxylate is seen as neutral. And ii) no error is reported if the same data set about the cation is processed with -c -1, i.e

$ python ./xyz2mol_b.py KAXWOV_cation_xyz.xyz -c -1 > negative.sdf
$ obabel negative.sdf -O negative.bgf
1 molecule converted

negative

The earlier Python script above however can be edited; molecules which (overall) don't carry a charge are rewritten as .sdf obabel can process. Structure files which might carry a charge in excess will be left untouched and there will be a note about them back to the CLI. The prototype script below allows both the sequential / use-1-CPU-only mode as well as one to spread the work of xyz2mol all CPU available.

#!/usr/bin/env python3
"""
name   : filter.py
author : nbehrnd@yahoo.com
license: GPL2
date   : 2024-05-03
edit   :
purpose: perform xyz2mol on over all neutral molecules, inform abou the other
"""

import argparse
from multiprocessing import Pool  # engage multiple CPU (Python standard library)

import rdkit
from rdkit import Chem
from rdkit.Chem import rdDetermineBonds


def get_args():
    """Get command-line arguments"""

    parser = argparse.ArgumentParser(
        description="""
One, or multiple .xyz files can be processed to yield a individual .sdf files.
A molecule carrying an overall charge will not yield a .sdf; instead, the name
of the .xyz file to check again will be reported to the CLI.""",
        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
    )

    parser.add_argument(
        "file",
        metavar="FILE",
        type=argparse.FileType("rt"),
        nargs="+",
        help="""
Provide one, or multiple .xyz files to process (`*.xyz` is possible, too).""",
    )

    parser.add_argument(
        "-p", "--parallel", help="process with all available CPU", action="store_true"
    )

    return parser.parse_args()


def apply_xyz2mol(input_xyz):
    """convert .xyz into a .sdf block if molecule is over all charge balanced"""
    raw_mol = Chem.MolFromXYZFile(input_xyz)
    mol = Chem.Mol(raw_mol)

    try:
        rdkit.Chem.rdDetermineBonds.DetermineBonds(mol, charge=0)
        mol.SetProp("_Name", input_xyz)  # add an internal tag to .sdf block
        result = Chem.MolToMolBlock(mol)

        output_file = "".join([input_xyz, ".sdf"])
        with open(output_file, mode="w", encoding="utf-8") as new:
            new.write(result)

    except ValueError:
        print(f"{input_xyz} possibly is not charge balanced.  Check again.")


def main():
    """Join the functionalities"""
    args = get_args()

    if args.parallel:
        # distribute the lis of work on all CPU available, start:
        work_ahead = []
        for file in args.file:
            work_ahead.append(file.name)

        pool = Pool()
        pool.map(apply_xyz2mol, work_ahead)
        pool.close()
        pool.join()
    else:
        for file in args.file:
            apply_xyz2mol(file.name)


if __name__ == "__main__":
    main()

2024-05-03_negative.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants