Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atropisomeric stereo is removed during canonicalization when aryl ketone is present #7290

Open
pechersky opened this issue Mar 22, 2024 · 2 comments
Labels

Comments

@pechersky
Copy link
Contributor

pechersky commented Mar 22, 2024

Describe the bug
When we canonicalize atropisomers, some atropisomers lose the bond stereo, while others do not. Most do not lose it. We've identified some minimal examples where they do lose it.

To Reproduce

from rdkit import Chem
from rdkit.Chem.MolStandardize import rdMolStandardize
enumerator = rdMolStandardize.TautomerEnumerator()

for mol in Chem.SDMolSupplier("min.sdf"):
    pre_canon = set(bond.GetStereo() for bond in mol.GetBonds())
    canonical = enumerator.Canonicalize(mol)
    post_canon = set(bond.GetStereo() for bond in canonical.GetBonds())
    print(len(pre_canon), len(post_canon))

Expected behavior
The atropisomer bond stereo should not be lost during canonicalization.

Screenshots
image

min.sdf
  ChemDraw03222416162D

  0  0  0     0  0              0 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 18 19 0 0 0
M  V30 BEGIN ATOM
M  V30 1 C -2.492152 -0.616602 0.000000 0
M  V30 2 C -2.492152 -1.438737 0.000000 0
M  V30 3 C -1.779687 -1.849805 0.000000 0
M  V30 4 C -1.067987 -1.438737 0.000000 0
M  V30 5 C -1.067987 -0.616602 0.000000 0
M  V30 6 C -1.779723 -0.205597 0.000000 0
M  V30 7 N -1.779723 0.616539 0.000000 0
M  V30 8 C -0.355996 0.616602 0.000000 0
M  V30 9 C -0.355996 -0.205534 0.000000 0
M  V30 10 C 0.356352 -0.616808 0.000000 0
M  V30 11 C 1.068171 -0.205534 0.000000 0
M  V30 12 C 1.068171 0.616602 0.000000 0
M  V30 13 C 0.356471 1.027671 0.000000 0
M  V30 14 C 1.780162 1.027670 0.000000 0
M  V30 15 C 1.780162 1.849805 0.000000 0
M  V30 16 Cl -1.067987 1.027670 0.000000 0
M  V30 17 F 0.356352 -1.438945 0.000000 0
M  V30 18 O 2.492152 0.616602 0.000000 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 2 1 2
M  V30 2 1 2 3 CFG=3
M  V30 3 2 3 4
M  V30 4 1 5 4 CFG=3
M  V30 5 2 5 6
M  V30 6 1 6 1
M  V30 7 1 6 7
M  V30 8 2 8 9
M  V30 9 1 9 10
M  V30 10 2 10 11
M  V30 11 1 11 12
M  V30 12 2 12 13
M  V30 13 1 13 8
M  V30 14 1 12 14
M  V30 15 1 14 15
M  V30 16 1 8 16
M  V30 17 1 10 17
M  V30 18 2 14 18
M  V30 19 1 5 9
M  V30 END BOND
M  V30 END CTAB
M  END
$$$$
min.sdf
  ChemDraw03222416162D

  0  0  0     0  0              0 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 18 19 0 0 0
M  V30 BEGIN ATOM
M  V30 1 C -2.492100 -0.616993 0.000000 0
M  V30 2 C -2.492100 -1.439099 0.000000 0
M  V30 3 C -1.779659 -1.849384 0.000000 0
M  V30 4 C -1.067986 -1.439099 0.000000 0
M  V30 5 C -1.067986 -0.616993 0.000000 0
M  V30 6 C -1.779861 -0.205526 0.000000 0
M  V30 7 N -1.779861 0.616610 0.000000 0
M  V30 8 C -0.355996 0.616181 0.000000 0
M  V30 9 C -0.355996 -0.205926 0.000000 0
M  V30 10 C 0.356657 -0.617376 0.000000 0
M  V30 11 C 1.068119 -0.205926 0.000000 0
M  V30 12 C 1.068119 0.616181 0.000000 0
M  V30 13 C 0.356444 1.026468 0.000000 0
M  V30 14 C 1.780109 1.027249 0.000000 0
M  V30 15 C 1.780109 1.849384 0.000000 0
M  V30 16 Cl -1.067986 1.027249 0.000000 0
M  V30 17 F 0.356657 -1.439512 0.000000 0
M  V30 18 O 2.492100 0.616181 0.000000 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 2 1 2
M  V30 2 1 2 3 CFG=1
M  V30 3 2 3 4
M  V30 4 1 5 4 CFG=1
M  V30 5 2 5 6
M  V30 6 1 6 1
M  V30 7 1 6 7
M  V30 8 2 8 9
M  V30 9 1 9 10
M  V30 10 2 10 11
M  V30 11 1 11 12
M  V30 12 2 12 13
M  V30 13 1 13 8
M  V30 14 1 12 14
M  V30 15 1 14 15
M  V30 16 1 8 16
M  V30 17 1 10 17
M  V30 18 2 14 18
M  V30 19 1 5 9
M  V30 END BOND
M  V30 END CTAB
M  END
$$$$
min.sdf
  ChemDraw03222416162D

  0  0  0     0  0              0 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 17 18 0 0 0
M  V30 BEGIN ATOM
M  V30 1 C -2.136157 -0.616602 0.000000 0
M  V30 2 C -2.136157 -1.438737 0.000000 0
M  V30 3 C -1.423691 -1.849805 0.000000 0
M  V30 4 C -0.711991 -1.438737 0.000000 0
M  V30 5 C -0.711991 -0.616602 0.000000 0
M  V30 6 C -1.423727 -0.205597 0.000000 0
M  V30 7 N -1.423727 0.616538 0.000000 0
M  V30 8 C -0.000001 0.616602 0.000000 0
M  V30 9 C -0.000001 -0.205534 0.000000 0
M  V30 10 C 0.712348 -0.616809 0.000000 0
M  V30 11 C 1.424167 -0.205534 0.000000 0
M  V30 12 C 1.424167 0.616602 0.000000 0
M  V30 13 C 0.712467 1.027670 0.000000 0
M  V30 14 C 2.136157 1.027669 0.000000 0
M  V30 15 C 2.136157 1.849805 0.000000 0
M  V30 16 Cl -0.711991 1.027669 0.000000 0
M  V30 17 F 0.712348 -1.438945 0.000000 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 2 1 2
M  V30 2 1 2 3 CFG=3
M  V30 3 2 3 4
M  V30 4 1 5 4 CFG=3
M  V30 5 2 5 6
M  V30 6 1 6 1
M  V30 7 1 6 7
M  V30 8 2 8 9
M  V30 9 1 9 10
M  V30 10 2 10 11
M  V30 11 1 11 12
M  V30 12 2 12 13
M  V30 13 1 13 8
M  V30 14 1 12 14
M  V30 15 1 14 15
M  V30 16 1 8 16
M  V30 17 1 10 17
M  V30 18 1 5 9
M  V30 END BOND
M  V30 END CTAB
M  END
$$$$
min.sdf
  ChemDraw03222416162D

  0  0  0     0  0              0 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 17 18 0 0 0
M  V30 BEGIN ATOM
M  V30 1 C -2.136104 -0.616993 0.000000 0
M  V30 2 C -2.136104 -1.439097 0.000000 0
M  V30 3 C -1.423663 -1.849384 0.000000 0
M  V30 4 C -0.711992 -1.439097 0.000000 0
M  V30 5 C -0.711992 -0.616993 0.000000 0
M  V30 6 C -1.423865 -0.205526 0.000000 0
M  V30 7 N -1.423865 0.616610 0.000000 0
M  V30 8 C -0.000001 0.616180 0.000000 0
M  V30 9 C -0.000001 -0.205926 0.000000 0
M  V30 10 C 0.712652 -0.617376 0.000000 0
M  V30 11 C 1.424113 -0.205926 0.000000 0
M  V30 12 C 1.424113 0.616180 0.000000 0
M  V30 13 C 0.712440 1.026468 0.000000 0
M  V30 14 C 2.136104 1.027248 0.000000 0
M  V30 15 C 2.136104 1.849384 0.000000 0
M  V30 16 Cl -0.711992 1.027248 0.000000 0
M  V30 17 F 0.712652 -1.439512 0.000000 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 2 1 2
M  V30 2 1 2 3 CFG=1
M  V30 3 2 3 4
M  V30 4 1 5 4 CFG=1
M  V30 5 2 5 6
M  V30 6 1 6 1
M  V30 7 1 6 7
M  V30 8 2 8 9
M  V30 9 1 9 10
M  V30 10 2 10 11
M  V30 11 1 11 12
M  V30 12 2 12 13
M  V30 13 1 13 8
M  V30 14 1 12 14
M  V30 15 1 14 15
M  V30 16 1 8 16
M  V30 17 1 10 17
M  V30 18 1 5 9
M  V30 END BOND
M  V30 END CTAB
M  END
$$$$

Configuration (please complete the following information):

  • RDKit version: 2024.3.1b1
  • OS: Ubuntu 20.04
  • Python version (if relevant): 3.11 (not sure if relevant)
  • Are you using conda? no
  • If you are using conda, which channel did you install the rdkit from? n/a
  • If you are not using conda: how did you install the RDKit?: built the wheels using the same workflow as in https://github.com/kuelumbus/rdkit-pypi

Additional context
I have other possible examples of lack of identification, this is a minimized example that hopefully has the same issue as my other issues.

@pechersky pechersky added the bug label Mar 22, 2024
@greglandrum
Copy link
Member

Hi @pechersky,
Thanks for testing out the beta of the new release!
Since the atropisomer support is new, this is new territory for us, but I believe that what happens in this case is correct.

Here are the tautomers which result from enumerating your first molecule:
image
In three of those tautomers, the atropisomeric bond has been converted to a double bond, that will scramble the atropisomerism..

The second molecule only has one tautomer (the starting structure), so this scrambling doesn't happen.

One can argue about the rules for enumerating tautomers (and people do!), but given the rules which are being used, I think this is the correct behavior. Seem reasonable?

@pechersky
Copy link
Contributor Author

Hi Greg,

Thanks for the quick response. I completely agree with you that given the current rules for enumerating tautomers, the conversion to the double bond will get lost. In our use case, we can choose to record the atropisomerism before canonicalization, so this isn't a blocker.

However, seems like this behavior is unexpected in two ways: in the case of tautomerization during canonicalization on E/Z defined double bonds, SetRemoveBondStereo(False) retains E/Z in the canonicomer. Here, retaining by toggling the flag has no effect. Additionally, and this is where it gets tricky, I think in atropisomeric systems, the orbitals are not conjugated so that shifting around the bond orders would not occur readily. With regards to tautomerization during canonicalization, I understand why this shouldn't be a factor in applying existing rules -- this would mean that either all biaryls need to be first assessed for potential atropisomerism (of drawn with a flat biaryl bond) or that stereoindicated biaryls get treated differently. Once again, in our use case, we will bypass the issue by capturing stereo prior to canonicalization and hope to not have tautome versions of the same atropisomers around.

For a separate, upcoming issue, we have identified that atropisomerism isn't recognized in biaryl macrocyclic systems -- we're still minimizing the test cases. Thank you again for your quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants