MolField hassubstruct perfomance for complex molecules #28

ivannnnnnnnnn · 2022-08-31T16:12:26Z

Hello! Sorry for the possibly off topic question and my English) But maybe django-rdkit community help me with my trouble

I am using django-rdkit for store mol objects. In my database I have 10 millions molecules. I am having trouble with this amount of data when try to select molecules which is substructure of target molecule if target molecule is complex.

For example if I need select molecules when hassubstruct= c1ccccc1 its work fast. But when I try to select molecules with hassubstruct= COc1cccnc1C1=CCN(C(=O)OC(C)(C)C)CC1 I am gave very slow query.

Maybe someone have same troubles and have recommendations how to up performance.

And next one questions is which algoritm rdkit catridge use for this (hassubstruct (@>)) operation. Maybe someone know any articls about this, or can explain. I'm asking because I think there might be ways to optimize search speed with data mining. For example, I do not use exact lookup to accurately search for a molecule, but instead I store smiles in a separate field in the same model and search for them. Perhaps it will also be possible to simplify the search for substructures.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MolField hassubstruct perfomance for complex molecules #28

MolField hassubstruct perfomance for complex molecules #28

ivannnnnnnnnn commented Aug 31, 2022

MolField hassubstruct perfomance for complex molecules #28

MolField hassubstruct perfomance for complex molecules #28

Comments

ivannnnnnnnnn commented Aug 31, 2022