-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multiple smarts in --cut-smarts #15
Comments
Andrew commented on this request on the RDKit-discuss mailing list as: The SMARTS handling code only touches <50 lines of code. It does not seem that hard to have it take multiple --cut-smarts, apply each of the cuts, find the unique union of those cuts, and work with them. Could you add that as a issue in the mmpdb tracker? It is in principle possible to merge two fragment files together and index the result. However, it would be difficult to use the indexed database for analysis purposes, because any input/query structure would use the single SMARTS pattern defined in the database. |
At the RDKit UGM Hackathon 2019, this question came up again. Participants wanted to use the RECAP rules for cutting. Creating a single SMARTS to match all 11 rules might theoretically be possible, but would results in an extremely complicated string which would then be hard to debug and modify. Extending RDKit such that a list of SMARTS is appears as the preferred long term solution. |
There are 12 rules, not 11:
How to people want to specify the cut with these? Is the cut match defined with the product side of the reaction, and the reactant side ignored? Some of those SMARTS use more than two atoms. The first makes a cut between :1 and :2 while the second makes a cut between :2 and :3. That means that if the reaction side is ignored (eg, if the cut is always made between :1 and :2) then there will be problems. It could do a more in-depth analysis of the transform to detect if there is a labeled pair on the product side which is not a labeled pair in the reactant side, and use that for the cut. But that's overkill if people really just want |
I'm thinking to support it as Looking at the RECAP rules, there are several places where I see problems.
Given NC(=O)N this removes the C(=O) to give N.N. Should the SMARTS be
The existing code only allows cuts on single bonds. The RECAP patterns use Note that
There are far more matches with It looks like in those few cases where the non-ring bond type is not specified, it's okay for me to say it's a single bond, without changing the intent of matching an amide. It also seems like that RECAP definition in RDKit is wrong, in that it is not supposed to match a double bond there.
The pattern mmpdb cannot handle this case. Should I drop it? |
Going back to acquaregia's request, can you give an example of of the SMIRKS you are interested in? I can see two steps that might be affected: 1) limit the fragment to just a few SMARTS patterns, and 2) limit the indexing to just a few SMIRKS patterns. I would like to see some of the SMIRKS to get a better feel for how to handle this. For example, if all of the SMIRKS were transforms of R-groups to R-groups, where the R-groups could be defined as SMILES fragments with a single attachment point denoted Otherwise, if multiple distinct SMARTS are needed, then the mmpdb file formats need to change someone in order to store them. There could be multiple entries, one per definition, or they could be space/tab separated. |
I have large database 500K compounds and I am interested in finding only few transforms.
Ideally I would like to give transform in the form of smirks.
I understand that it might be easier to ask for a different fragmentation pattern and perform indexing on it.
I can translate the smirks into smarts specifying specific bonds.
For the tool to be useful I would like to be able to provide more than one SMARTS to the --cut-smarts option.
It would be excellent if an option like --cache would allow using a fragmentation file and enhance it by specifying other cut patterns.
Thanks.
marco
The text was updated successfully, but these errors were encountered: