Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MolField hassubstruct perfomance for complex molecules #28

Open
ivannnnnnnnnn opened this issue Aug 31, 2022 · 0 comments
Open

MolField hassubstruct perfomance for complex molecules #28

ivannnnnnnnnn opened this issue Aug 31, 2022 · 0 comments

Comments

@ivannnnnnnnnn
Copy link

Hello! Sorry for the possibly off topic question and my English) But maybe django-rdkit community help me with my trouble

I am using django-rdkit for store mol objects. In my database I have 10 millions molecules. I am having trouble with this amount of data when try to select molecules which is substructure of target molecule if target molecule is complex.

For example if I need select molecules when hassubstruct= c1ccccc1 its work fast. But when I try to select molecules with hassubstruct= COc1cccnc1C1=CCN(C(=O)OC(C)(C)C)CC1 I am gave very slow query.

Maybe someone have same troubles and have recommendations how to up performance.

And next one questions is which algoritm rdkit catridge use for this (hassubstruct (@>)) operation. Maybe someone know any articls about this, or can explain. I'm asking because I think there might be ways to optimize search speed with data mining. For example, I do not use exact lookup to accurately search for a molecule, but instead I store smiles in a separate field in the same model and search for them. Perhaps it will also be possible to simplify the search for substructures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant