You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
first of all I'd like to precise that I have very little knowledge about chemistry (I'm a programmer/ML engineer).
I want to decompose a molecule into rings and bonds for constructing a Motif vocabulary (similar to the one cited by many SOTA techniques).
I have these two SMILES that represent the same molecule:
O=S1OCc2ccccc21
[cH:4]1[cH:5][cH:6][cH:7][c:8]2[c:9]1[S:10](=[O:11])[O:12][CH2:13]2
The second one is annotated with atommap.
Now, after extracting bonds and rings I obtain one differing sub-molecule (the first):
C1COSC1O=Sc1ccccc1
C1=CSOC1O=Sc1ccccc1
The code I'm using for extracting rings is:
defmol_from_smiles(smiles: str, sanitize=True, kekulize=True) ->Chem.Mol:
mol=Chem.MolFromSmiles(smiles, sanitize=sanitize)
ifmolisNone:
raiseException(f"Invalid SMILES string: \"{smiles}\"")
ifkekulize:
Chem.Kekulize(mol)
returnmoldefextract_mol_fragment(mol: Chem.Mol, atom_indices: Set[int]) ->Chem.Mol:
""" Extracts a fragment (subset) of the input molecule keeping only the specified atoms. The chemical validity of the output fragment is ensured by incrementally building it with RWMol. """frag_mol=Chem.RWMol()
old_to_new_atom_idx= {}
foriinatom_indices:
new_idx=frag_mol.AddAtom(copy_atom(mol.GetAtomWithIdx(i)))
old_to_new_atom_idx[i] =new_idxforbondinmol.GetBonds():
u=bond.GetBeginAtomIdx()
v=bond.GetEndAtomIdx()
ifuinatom_indicesandvinatom_indices:
frag_mol.AddBond(
old_to_new_atom_idx[u],
old_to_new_atom_idx[v],
bond.GetBondType()
)
Chem.Kekulize(frag_mol)
returnfrag_mol.GetMol()
defcanon_smiles(smiles: str) ->str:
smiles=Chem.MolToSmiles(mol_from_smiles(smiles), kekuleSmiles=True) # KekulizereturnChem.CanonSmiles(smiles)
defextract_rings(mol_smiles: str) ->Set[str]:
rings=set()
foratom_indicesinChem.GetSymmSSSR(mol):
ring_mol=extract_mol_fragment(mol, atom_indices)
ring_smiles=Chem.MolToSmiles(ring_mol)
rings.add(canon_smiles(ring_smiles))
Since I run the same extraction code on both 1. and 2. and the only different between those are the atom map annotations, I can't see why I'm getting different results. Can anyone help me to spot which could be the cause?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello,
first of all I'd like to precise that I have very little knowledge about chemistry (I'm a programmer/ML engineer).
I want to decompose a molecule into rings and bonds for constructing a Motif vocabulary (similar to the one cited by many SOTA techniques).
I have these two SMILES that represent the same molecule:
O=S1OCc2ccccc21
[cH:4]1[cH:5][cH:6][cH:7][c:8]2[c:9]1[S:10](=[O:11])[O:12][CH2:13]2
The second one is annotated with atommap.
Now, after extracting bonds and rings I obtain one differing sub-molecule (the first):
C1COSC1
O=S
c1ccccc1
C1=CSOC1
O=S
c1ccccc1
The code I'm using for extracting rings is:
Since I run the same extraction code on both 1. and 2. and the only different between those are the atom map annotations, I can't see why I'm getting different results. Can anyone help me to spot which could be the cause?
Beta Was this translation helpful? Give feedback.
All reactions