Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdkit.Chem.GetSubstructMatch called in dm.scaffold.fuzzy_scaffolding takes rdkit mol as input, not str #119

Open
colinrsmall opened this issue Jul 22, 2022 · 3 comments
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation

Comments

@colinrsmall
Copy link

colinrsmall commented Jul 22, 2022

According to the datamol documentation and code, the enforce_subs parameter of dm.scaffold.fuzzy_scaffolding should be a list of str (presumably SMILES or SMARTS). However, passing in a list of str gives the following rdkit argument error:

image

This is because the dm.scaffold.fuzyy_scaffolding function passes the enfocre_subs parameter to rdkit.Chem.GetSubstructMatch, which should take as input an rdkit molecule object, not a string.

Additionally, upon loading the pattern SMILES as molecule object and passing that into the enforce_subs parameter, the function runs without error but dm.scaffold.fuzyy_scaffolding will miss the loaded pattern:

Edit: Image removed. This was a bad example I falsely thought wasn't capturing an enforce_subs r-group. Will find a better example.

@hadim
Copy link
Contributor

hadim commented Jul 26, 2022

Thanks for reporting the bug @colinrsmall.

Indeed this is misleading, and we should probably only accept mol or SMARTS objects here (or propose to do the conversion within the function itself).

Would you like to contribute and propose a fix in a PR?

Note that we are also considering revisiting how this function works at #114.

@hadim hadim added bug Something isn't working documentation Improvements or additions to documentation labels Jul 26, 2022
@colinrsmall
Copy link
Author

colinrsmall commented Jul 29, 2022

Sure, the documentation would be an easy fix. Looking at the source code, I'm concerned that some specified fragments in enforce_subs may be missed. If I understand the code correctly, after doing r-group decomposition, you check each r-group to see if they contain any specific fragments:

rgroups = [
    rgp
    for rgp in rgroups
    if not any([len(rgp.GetSubstructMatch(frag)) > 0 for frag in enforce_subs])
]

If they don't, then they get trimmed from the final scaffold (leaving any r-groups that contains a specified fragment remaining):

scaff = trim_side_chain(mol, AdjustQueryProperties(core, core_query_param), rgroups)

This will necessarily miss substructures that are part of both the core and an r-group, such as the atoms highlighted below:

image

Maybe this is desired behavior? If so, you might also want to update the documentation to specify that the fragments passed in to enforce_subs are true r-groups that would otherwise get wholly stripped away by regular Murcko scaffolding.

@maclandrol maclandrol self-assigned this Jul 29, 2022
@maclandrol
Copy link
Member

Hello @colinrsmall,

You are right about this being the original desired behaviour. The enforce_subs should be interpreted as enforce substitutions (keep vectors attached to the core when they match any fragment in the list).

Since the core here is defined by the Murcko scaffold, then as you said, the behavior can not allow overlapping between the Rgroups and the Murcko core.

Was your expectation that the core itself would contain the queries passed as input by enforce_subs ? This would indeed be a valid use case for this function and in that case we would need to loosely defined the scaffold, based on the ring systems, while including atom matching the query of interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants