Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenBabel determines incorrect sum formula #2656

Open
schatzsc opened this issue Dec 13, 2023 · 11 comments
Open

OpenBabel determines incorrect sum formula #2656

schatzsc opened this issue Dec 13, 2023 · 11 comments

Comments

@schatzsc
Copy link

When processing the attached molfile of a metal carbene complex, OpenBabel reports the sum formula incorrectly as C13H14BrF6N5PPt instead of correct C13H13BrF6N5PPt (using the stand-alone OpenBabel 3.1.1). The correct sum formula results from 2x CH3 at the periphery, 2x H in the flanking carbene moieties, and three H in the central pyridine ring (two meta and one para H).

Somehow, OpenBabel "hallucinates" an additional implict hydrogen, which is a serious bug, as was first discovered in Chemotion, which relies on OpenBabel:

ComPlat/chemotion_ELN#1551

With the aid of conversion of molfile to the "molreport" format, this "surplus H" can be traced down to the following parts of the output:

Apparently, OpenBabel breaks three of the four bonds the platinum has towards the pyridine N, the two carbene C and the bromido ligand, only retaining one of the two Pt-C(carbene) bonds, as there is only one bond involving the Pt atom #17:

ATOM: 3 C TYPE: C2 HYB: 2 CHARGE: 0.0000
ATOM: 4 N TYPE: N2 HYB: 2 CHARGE: 0.5000
ATOM: 8 N TYPE: Npl HYB: 2 CHARGE: 0.5000
ATOM: 10 N TYPE: Npl HYB: 2 CHARGE: 0.0000
ATOM: 12 N TYPE: Npl HYB: 2 CHARGE: 0.5000
ATOM: 15 N TYPE: Npl HYB: 2 CHARGE: 0.0000
ATOM: 17 Pt TYPE: Pt HYB: 3 CHARGE: 0.0000
ATOM: 20 Br TYPE: Br HYB: 1 CHARGE: 0.0000
BOND: 17 START: 3 END: 12 ORDER: 1

This in turn seems to traces back to Pt assigned HYB: 3 (hybridization?!?) combined with a Pt charge of +1, which leads it to be allowed only one bond above the "standard valence" of two. This, in turn, results in a "lonely" Br which is "fixed" by adding a H to the Br, as is evident from the InChI also generated by OpenBabel:

InChI=1S/C13H13N5.BrH.F6P.Pt/c1-15-6-8-17(10-15)12-4-3-5-13(14-12)18-9-7-16(2)11-18;;1-7(2,3,4,5)6;/h3-9H,1-2H3;1H;;/q;;-1;+1

Therefore, the origin of this bug is the incorrect breaking of some metal-ligand bonds although they are marked as regular single bonds in the molfile:

20 17 1 0 0 0

As this leads not only to an incorrect sum formula but if the sum formula is further processed, also to incorrect molecular weight and thus incorrect yield, for example in Chemotion, it is considered a critical bug.

carbene_complex.mol.txt

Molreport for above molfile:

TITLE:
FORMULA: C13H14BrF6N5PPt
MASS: 660.2299
TOTAL SPIN: 2
ATOM: 1 C TYPE: C2 HYB: 2 CHARGE: 0.0000
ATOM: 2 C TYPE: C2 HYB: 2 CHARGE: 0.0000
ATOM: 3 C TYPE: C2 HYB: 2 CHARGE: 0.0000
ATOM: 4 N TYPE: N2 HYB: 2 CHARGE: 0.5000
ATOM: 5 C TYPE: C2 HYB: 2 CHARGE: 0.0000
ATOM: 6 C TYPE: C2 HYB: 2 CHARGE: 0.0000
ATOM: 7 C TYPE: C2 HYB: 2 CHARGE: 0.0000
ATOM: 8 N TYPE: Npl HYB: 2 CHARGE: 0.5000
ATOM: 9 C TYPE: C3 HYB: 3 CHARGE: 0.0000
ATOM: 10 N TYPE: Npl HYB: 2 CHARGE: 0.0000
ATOM: 11 C TYPE: C2 HYB: 2 CHARGE: 0.0000
ATOM: 12 N TYPE: Npl HYB: 2 CHARGE: 0.5000
ATOM: 13 C TYPE: C2 HYB: 2 CHARGE: 0.0000
ATOM: 14 C TYPE: C2 HYB: 2 CHARGE: 0.0000
ATOM: 15 N TYPE: Npl HYB: 2 CHARGE: 0.0000
ATOM: 16 C TYPE: C3 HYB: 3 CHARGE: 0.0000
ATOM: 17 Pt TYPE: Pt HYB: 3 CHARGE: 0.0000
ATOM: 18 C TYPE: C3 HYB: 3 CHARGE: 0.0000
ATOM: 19 C TYPE: C3 HYB: 3 CHARGE: 0.0000
ATOM: 20 Br TYPE: Br HYB: 1 CHARGE: 0.0000
ATOM: 21 P TYPE: P HYB: 3 CHARGE: -1.0000
ATOM: 22 F TYPE: F HYB: 1 CHARGE: 0.0000
ATOM: 23 F TYPE: F HYB: 1 CHARGE: 0.0000
ATOM: 24 F TYPE: F HYB: 1 CHARGE: 0.0000
ATOM: 25 F TYPE: F HYB: 1 CHARGE: 0.0000
ATOM: 26 F TYPE: F HYB: 1 CHARGE: 0.0000
ATOM: 27 F TYPE: F HYB: 1 CHARGE: 0.0000
BOND: 0 START: 1 END: 2 ORDER: 1
BOND: 1 START: 2 END: 3 ORDER: 2
BOND: 2 START: 3 END: 4 ORDER: 1
BOND: 3 START: 4 END: 5 ORDER: 2
BOND: 4 START: 5 END: 6 ORDER: 1
BOND: 5 START: 6 END: 1 ORDER: 2
BOND: 6 START: 7 END: 8 ORDER: 1
BOND: 7 START: 8 END: 9 ORDER: 1
BOND: 8 START: 9 END: 10 ORDER: 1
BOND: 9 START: 10 END: 11 ORDER: 1
BOND: 10 START: 11 END: 7 ORDER: 2
BOND: 11 START: 12 END: 13 ORDER: 1
BOND: 12 START: 13 END: 14 ORDER: 2
BOND: 13 START: 14 END: 15 ORDER: 1
BOND: 14 START: 15 END: 16 ORDER: 1
BOND: 15 START: 16 END: 12 ORDER: 1
BOND: 16 START: 8 END: 5 ORDER: 1
BOND: 17 START: 3 END: 12 ORDER: 1
BOND: 18 START: 16 END: 17 ORDER: 1
BOND: 19 START: 9 END: 17 ORDER: 1
BOND: 20 START: 4 END: 17 ORDER: 1
BOND: 21 START: 15 END: 18 ORDER: 1
BOND: 22 START: 10 END: 19 ORDER: 1
BOND: 23 START: 20 END: 17 ORDER: 1
BOND: 24 START: 21 END: 22 ORDER: 1
BOND: 25 START: 21 END: 23 ORDER: 1
BOND: 26 START: 21 END: 24 ORDER: 1
BOND: 27 START: 21 END: 25 ORDER: 1
BOND: 28 START: 21 END: 26 ORDER: 1
BOND: 29 START: 21 END: 27 ORDER: 1

Copy link

welcome bot commented Dec 13, 2023

Thanks for opening your first issue here! Be sure to follow the issue template!

@nbehrnd
Copy link
Contributor

nbehrnd commented Dec 13, 2023

@schatzsc Not a solution, only a complementary observation: by default, obabel clearly indicates that the input structure is (in comparison to a "more usual one") problematic e.g., during the conversion from .mol to .png:

$ obabel -imol carbene_complex.mol.txt -O na.png
==============================
*** Open Babel Warning  in ReadMolecule
  WARNING: Problem interpreting the valence field of an atom
The valence field specifies a valence 2 that is
less than the observed explicit valence 3.

==============================
*** Open Babel Warning  in ReadMolecule
  WARNING: Problem interpreting the valence field of an atom
The valence field specifies a valence 2 that is
less than the observed explicit valence 3.
WARNING: Problem interpreting the valence field of an atom
The valence field specifies a valence 2 that is
less than the observed explicit valence 3.

==============================
*** Open Babel Warning  in ReadMolecule
  WARNING: Problem interpreting the valence field of an atom
The valence field specifies a valence 2 that is
less than the observed explicit valence 3.
WARNING: Problem interpreting the valence field of an atom
The valence field specifies a valence 2 that is
less than the observed explicit valence 3.
WARNING: Problem interpreting the valence field of an atom
The valence field specifies a valence 2 that is
less than the observed explicit valence 4.

1 molecule converted
norwid@debian:~/Desktop$ obabel -imol carbene_complex.mol.txt -O check.sdf
==============================
*** Open Babel Warning  in ReadMolecule
  WARNING: Problem interpreting the valence field of an atom
The valence field specifies a valence 2 that is
less than the observed explicit valence 3.

==============================
*** Open Babel Warning  in ReadMolecule
  WARNING: Problem interpreting the valence field of an atom
The valence field specifies a valence 2 that is
less than the observed explicit valence 3.
WARNING: Problem interpreting the valence field of an atom
The valence field specifies a valence 2 that is
less than the observed explicit valence 3.

==============================
*** Open Babel Warning  in ReadMolecule
  WARNING: Problem interpreting the valence field of an atom
The valence field specifies a valence 2 that is
less than the observed explicit valence 3.
WARNING: Problem interpreting the valence field of an atom
The valence field specifies a valence 2 that is
less than the observed explicit valence 3.
WARNING: Problem interpreting the valence field of an atom
The valence field specifies a valence 2 that is
less than the observed explicit valence 4.

1 molecule converted

The illustration then carries two dots, one next of each imidazole, as well as a pentavalent nitrogen:

check

I speculate some of the troubles would not occur if hydrogens in the .mol were explicit. Can you please check if the program you use offers this option? While reading the .mol, the automatic addition of hydrogens by Jmol (version 16.1.41 2023-09-15 17:39) equally obtains a problematic representation:

jmol

@nbehrnd
Copy link
Contributor

nbehrnd commented Dec 13, 2023

@schatzsc An other plausible cause for the troubles could be the use of single bonds defined between the complex, and the ion to be chelated. Beside bond order of single, double, triple bond, the .sdf/.mol syntax (I refer to the v3000 dialect!) equally offers the dative bond of type 9 (cf link to a pdf on archive here, page 11). Can you please check if the program you use equally support this type?

It was an issue in a recent contribution to Avogadro addressed in a PR here.

Cross post to chemotion_ELN here.

@nbehrnd
Copy link
Contributor

nbehrnd commented Dec 13, 2023

@schatzsc I redrew the structure on the test page of Marvin JS (link) using the coordinate/dative bond. Pt was assigned a positive charge, Br for compensation a negative one:

test_in_marvin

Beside saving the structure in the program's original format as a Marvin object (.mrv), it was exported as .mol -- once in the v2000 dialect, once in the more recent v3000 dialect. The Hill formula matches your anticipated result:

$ obabel marvin_mol2000.mol -oreport | head -10
FILENAME: 
FORMULA: C13H13BrN5Pt
MASS: 514.2578
EXACT MASS: 513.0002234
INTERATOMIC DISTANCES

              C   1      C   2      C   3      N   4      C   5      C   6
              ------------------------------------------------------------------
   C   1    0.0000 
   C   2    0.8249     0.0000 
1 molecule converted

and

$ obabel marvin_mol3000.mol -oreport | head -10
FILENAME: 
FORMULA: C13H13BrN5Pt
MASS: 514.2578
EXACT MASS: 513.0002234
INTERATOMIC DISTANCES

              C   1      C   2      C   3      N   4      C   5      C   6
              ------------------------------------------------------------------
   C   1    0.0000 
   C   2    1.5399     0.0000 
1 molecule converted

For the visualization by obabel, -xu replaces the element colors by a black-and-white scheme favourable here, for instance

$ obabel marvin_mol2000.mol -O mol2000.png -xu
1 molecule converted

mol2000

To ease a replication, see the archive attached below. For the test, I used obabel (version 3.1.1 -- Jan 4 2023 -- 09:58:24) as provided by Linux Debian 13/trixie (branch testing).

2023-12-13_test_in_marvin.zip

@schatzsc
Copy link
Author

@schatzsc Not a solution, only a complementary observation: by default, obabel clearly indicates that the input structure is (in comparison to a "more usual one") problematic e.g., during the conversion from .mol to .png:

$ obabel -imol carbene_complex.mol.txt -O na.png
==============================
*** Open Babel Warning  in ReadMolecule
  WARNING: Problem interpreting the valence field of an atom
The valence field specifies a valence 2 that is
less than the observed explicit valence 3.

@nbehrnd Well, the carbene carbon atoms are indeed another problematic case - the "standard valence" of 2 had to be set here because otherwise, the hydrogen count was too high by 3 not 1 (as you also observed with the Jmol picture). That of course is at conflict with the Lewis formula (and molfile) which features 3 bonds, two C-N and one C-Pt.

The main problem seems to be that bonds which do not arise from combination of "radical fragments" like A* and B* -> A-B but Lewis acid/base pairs or otherwise not in line with the "octet rule" are not treated very well by most cheminformatics approaches.

@schatzsc
Copy link
Author

@schatzsc I redrew the structure on the test page of Marvin JS (link) using the coordinate/dative bond. Pt was assigned a positive charge, Br for compensation a negative one:

@nbehrnd As also mentioned in the Chemotion issue metal-ligand bonds MUST NOT be broken. This will also be the standard in the latest upcoming IUPAC InChI recommendations on organometallics.

Furthermore, "dative" bonds have no relevancy at least in coordination and organometallic chemistry of the d- and f-elements - all those bonds are more or less polar covalent bonds.

The best solution would actually for atoms and bonds to be marked with an attribute "don't touch" in terms of bond order and addition of implicit hydrogens and M-L "lines" simply be taken as an indication of a bonding interaction without any implication on the electron count.

In the context of TUCAN we have called these "simple bonds"

Maybe some Zoom discussion of software developers together with the IUPAC InChi organometallics team would be helpful to work towards general solutions?

@nbehrnd
Copy link
Contributor

nbehrnd commented Dec 17, 2023

@schatzsc

Maybe some Zoom discussion of software developers together with the IUPAC InChi organometallics team would be helpful to work towards general solutions?

Yes I do think it were best if the relevant IUPAC commissions and major creators of software (e.g., ChemDraw, Marvin, ChemDoodle; libraries like RDKit), of databases (Reaxys, PubChem, etc) and publishers would get together on this.

On one hand, Brecher's compiled rule GR 1.7.1 about coordination compounds[1] advocates plain single lines. The entry in IUPAC's Gold Book[2] with the example about the complex of ammonia and borane however uses the arrow; possibly the delta in the electronegativities in this example is larger, than in your example of the carbene complex. However, Gold Book equally features an entry coordination[2b] with the statement

«The synonym 'dative bond' is obsolete. (The origin of the bonding electrons has by itself no bearing on the character of the bond formed. Thus, the formation of methyl chloride from a methyl cation and a chloride ion involves coordination; the resultant bond obviously differs in no way from the C–Cl bond in methyl chloride formed by any other path, e.g. by colligation of a methyl radical and a chlorine atom.)»

which is more similar to your argument "use a plain line". I don't know why then IUPAC's Gold Book (still) retains an entry about dative bonds.

The test site of ChemDraw JS[3] -- which should be the easiest for the owner to update (once CambridgeSoft -- Brecher's affiliation 2008 -- meanwhile acquired by Perkin), uses dative bonds and again the arrow. A query of PubChem for ZnPc[4] yields a 2D model which does not deploy arrows to the more electropositive metal, but instead dotted lines. It seems to be their pattern adopted, based on the 2018 publication «PubChem chemical structure standardization»[5] by authors involved in PubChem's work (figs 13, 20b, 29b, 29c).

Et3N -> BH3: Pauling electronegativity N: 3.04, B: 2.04, delta: 1.00
in case of ZnPc with bonds between Zn and N: Zn: 1.65, delta: 1.39
CuPc Cu: 1.90, delta: 1.50
but
imidazole ./. Pt (your carbene complex) C: 2.55, Pt: 2.28, delta: 0.27
(values by https://en.wikipedia.org/wiki/Electronegativity, version 2023-12-07)

[1] Graphical representation standards for chemical structure diagrams (IUPAC Recommendations 2008), https://doi.org/10.1351/pac200880020277
[2] https://goldbook.iupac.org/terms/view/D01523, revision February 24, 2014
[2b] https://goldbook.iupac.org/terms/view/C01329, revision February 24, 2014
[3] https://chemdrawdirect.perkinelmer.cloud/js/sample/index.html
[4] https://pubchem.ncbi.nlm.nih.gov/compound/518924
[5] https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0293-8

@nbehrnd
Copy link
Contributor

nbehrnd commented Dec 18, 2023

@schatzsc For your information, there is an early note by Greg Landrum (project of RDKit) here by today, [2023-12-18 Mon]. By your contribution to chemotion_ELN you possibly have a couple of «typical» carbenes and other complexes one could submit to benchmark an eventually revised approach.

@schatzsc
Copy link
Author

@nbehrnd

Yes I do think it were best if the relevant IUPAC commissions and major creators of software (e.g., ChemDraw, Marvin, ChemDoodle; libraries like RDKit), of databases (Reaxys, PubChem, etc) and publishers would get together on this.

I will discuss with Sonja Herres-Pawlis from RWTH Aachen who is involved in both the management committee of NFDI4Chem but it will likely not be before the winter holidays anymore. One idea that was circulated was to organize such a meeting in the context of or as a satellite meeting to EuChemS congress 2024

Brecher's compiled rule GR 1.7.1 about coordination compounds[1] advocates plain single lines.

Yes, this is in line with the community I am familiar with. Furthermore, this paper goes on with an argument

"In spite of the analogy of dative bonds with covalent bonds, in that both types imply sharing a common electron pair between two vicinal atoms, the former are distinguished by their significant polarity, lesser strength, and greater length. The distinctive feature of dative bonds is that their minimum-energy rupture in the gas phase or in inert solvent follows the heterolytic bond cleavage path."

that is focused on the reactivity not structure, which is something outside regular cheminformatics models.

@schatzsc
Copy link
Author

@nbehrnd

Furthermore, Brecher makes an argument for explicit hydrogen atoms on all ligand atoms directly coordinated (bonded) to a metal center:

"Bonds representing coordination from one atom to a single other atom should be represented as normal plain single bonds. Any hydrogen atoms bonded to the atoms at either end of such a coordination bond must be shown clearly, even if that produces a diagram where some atoms appear to have non-standard valences, such as a nitrogen atom with four attached bonds."

This is particularly important for example with ammine (NH3) ligands, where most software mis-represents cisplatin as [PtCl2(NH2)2] instead of [PtCl2(NH3)2] without explicit hydrogens, since they do not recognize the use of the free electron pair on the ammonia ligands for bonding, which does along with increase of "valence" on N from 3 to 4 and try to maintain the "standard valence" of N as 3 by removing one H to NH2 instead of NH3

@schatzsc
Copy link
Author

@schatzsc For your information, there is an early note by Greg Landrum (project of RDKit) here by today, [2023-12-18 Mon]. By your contribution to chemotion_ELN you possibly have a couple of «typical» carbenes and other complexes one could submit to benchmark an eventually revised approach.

Thank you very much for pointing me to this additional line of discussion - will take a look there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants