Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tautomer canonicalizer invariant violation #7044

Open
fioruggiu opened this issue Jan 12, 2024 · 4 comments · May be fixed by #7137
Open

Tautomer canonicalizer invariant violation #7044

fioruggiu opened this issue Jan 12, 2024 · 4 comments · May be fixed by #7137
Labels

Comments

@fioruggiu
Copy link

fioruggiu commented Jan 12, 2024

Bug description:
Canonicalizer has a runtime error on a specific structure O=[N+](O-])/C=C/c1ccccn1`. Error cannot be caught by try except. Error seems to be related to the stereochemistry as it does not occur when deleted from SMILES string. Error is related to latest versions of rdkit as it does not happen in rdkit==2023.3.3. Initial error happened on Google VM with python 3.11.5 and conda environment within installed rdkit==2023.9.4 but it may not be relevant as error reproduces in Google Colab using pip install.

Code to reproduce:

from rdkit import Chem
from rdkit.Chem.MolStandardize import rdMolStandardize

try:
  mol = Chem.MolFromSmiles('O=[N+]([O-])/C=C/c1ccccn1')
  enumerator = rdMolStandardize.TautomerEnumerator()
  enumerator.Canonicalize(mol)
except Exception as err:
  print(err)

Also look at Google colab here.

Expected behavior:
Should not crash as molecule is valid and ideally should be able to be caught by try except to avoid interrupting workflow.

Configuration:

  • RDKit version: 2023.9.4
  • OS: [e.g. Ubuntu 20.04]
  • Python version (likely not relevant): 3.11.5
  • Are you using conda? Yes - miniconda
  • If you are using conda, which channel did you install the rdkit from? conda-forge
@fioruggiu fioruggiu added the bug label Jan 12, 2024
@greglandrum greglandrum changed the title Canonicalizer invariant violation in rdkit=2023.9.4 Canonicalizer invariant violation Jan 16, 2024
@greglandrum greglandrum changed the title Canonicalizer invariant violation Tautomer canonicalizer invariant violation Jan 16, 2024
@greglandrum
Copy link
Member

Confirmed.
Thanks for the detailed bug report @fioruggiu !

@pechersky
Copy link
Contributor

Another example is an oxime. Note, there is no error if SetRemoveBondStereo(True), or if the oxime has no bond stereo to begin with.

$ docker run --rm -v $(pwd):/app python:3.11-slim sh -c 'pip install -q rdkit; python -c "import rdkit; import rdkit.Chem; from rdkit.Chem.MolStandardize import rdMolStandardize; enumerator = rdMolStandardize.TautomerEnumerator(); enumerator.SetRemoveBondStereo(False); canon = enumerator.Canonicalize(rdkit.Chem.MolFromSmiles(\"O/N=C/c1ccccn1\"))"'
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: pip install --upgrade pip
[20:33:25] 

****
Invariant Violation
could not find atom2
Violation occurred on line 228 in file /project/build/temp.linux-x86_64-cpython-311/rdkit/Code/GraphMol/Canon.cpp
Failed Expression: firstFromAtom2
----------
Stacktrace:
----------
****

Traceback (most recent call last):
  File "<string>", line 1, in <module>
RuntimeError: Invariant Violation
        could not find atom2
        Violation occurred on line 228 in file Code/GraphMol/Canon.cpp
        Failed Expression: firstFromAtom2
        RDKIT: 2023.09.4
        BOOST: 1_78

@pechersky
Copy link
Contributor

pechersky commented Jan 31, 2024

Bisected to #6643

In debugging (VERBOSE_ENUMERATION) the issue, I saw the canonicalizer is generating a molecule that couldn't be converted to smiles. The graph looked like
image
I'm afraid that the molecule has "directions" still on the double bond, like a (for lack of a better representation) c/C=C/=[N+](-[O-])O,

@greglandrum
Copy link
Member

Here's a minimal reproducible of this that does not require tautomer canonicalization:

In [2]: m = Chem.MolFromSmiles('CC=C=CC')

In [3]: m.GetBondWithIdx(2).SetStereoAtoms(1,4)

In [4]: m.GetBondWithIdx(2).SetStereo(Chem.BondStereo.STEREOCIS)

In [5]: Chem.MolToSmiles(m)
[09:32:30] 

****
Invariant Violation
could not find atom1
Violation occurred on line 222 in file /localhome/glandrum/RDKit_git/Code/GraphMol/Canon.cpp
Failed Expression: firstFromAtom1
----------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants