Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any routine to check the 3D validity of a molecule #2651

Open
AlexisGeorgiou opened this issue Nov 29, 2023 · 3 comments
Open

Is there any routine to check the 3D validity of a molecule #2651

AlexisGeorgiou opened this issue Nov 29, 2023 · 3 comments

Comments

@AlexisGeorgiou
Copy link

Hey I am quite new in this and I have read the documentation.
I have a list of SMILES and I want to check if they are valid in 3d space, like a structure validation. This way I can get a percentage of 3D validity in my set of molecules.

I am converting my smi to pdb or pdbqt and I generate3D using either cmd or python binding, and I get my converted file with sane coordinates in every atom but some of them are not valid, the conversion still happens so I don't have any way to know.

Is there a feature for 3d structure validation?

Thanks for the help.

Copy link

welcome bot commented Nov 29, 2023

Thanks for opening your first issue here! Be sure to follow the issue template!

@ghutchis
Copy link
Member

First, I'm not quite sure I understand what you mean by "3D validity" .. do you mean "are valid molecules" e.g. no 5 bonds to carbon?

But I'm not sure on a conceptual level how you would do that without generating 3D coordinates and failing?

@nbehrnd
Copy link
Contributor

nbehrnd commented Nov 29, 2023

@AlexisGeorgiou There is a technique called sanitize SMILES strings. This is to validate SMILES strings (as Geoffrey mentioned, to identify and sort out e.g. pentavalent carbons early), but not limited to this. See RDKit's cookbook, entry Detect Chemistry Problems as an entry and means to cure the problems by an algorithm. Though keep in mind, there often are multiple SMILES strings one can assign to a structure, e.g.

obabel -:'c1ccncc1' -osmi -xk
C1C=CN=CC=1	
1 molecule converted

which entered pyridine in an implicit description of the aromaticity to provide a kekulized SMILES string. And there are universal and inchified SMILES strings by openbabel, too (link). As a sequence of characters, OpenBabel's default SMILES about a structure need not be equal to the one by RDKit's default implementation either. (If creating a database, ascertain consistency in the generation of the data.)

Aside from valence, stereochemistry in a SMILES string can be an issue on its own. For instance C/C=C/C about (E)-butene, C/C=C\C about (Z)-butene, and CC=CC which can describe both are all valid by SMILES syntax (see there depiction e.g. by CDK depict). Similar ambiguous if e.g. OC(c1ccccc1)C(=O)O aims to describe the (R)-, the (S)- or both enantiomers of mandelic acid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants