Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support writing CX extensions in reactions #7340

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rachelnwalker
Copy link
Contributor

We are hoping to get support for writing CX extensions when writing to reaction SMILES/SMARTS. This is just a draft, I wanted to see if we think this would work. At the very least it would be nice to be able to read/write atom labels and stereo information.

Currently, there are a few things that aren’t supported

  • Link nodes
  • Coordinate bonds
  • Writing to CX smiles (worried about atom/fragment reordering?)

I’m also a little confused about the substance group writing — it seems like the atoms are correct, however there are some extra commas and colons written compared to the input. RDKit seems to parse this correctly but I'm not sure if this is expected/a problem? For example:

Input SMARTS: [CH3:6][O:5][CH:3](-*)[O:2]-*>>[CH3:6][NH:5][CH:3](-*)[O:2]-* |$;;;star_e;;star_e;;;;star_e;;star_e$,SgD:1,0:foo:bar::::,SgD:7,6:foo:baz::::,Sg:n:4,2,1,0::ht,Sg:n:10,8,7,6::ht,SgH:2:0,3:1|

Roundtrip output SMARTS: [C&H3:6][O:5][C&H1:3](-*)[O:2]-*>>[C&H3:6][N&H1:5][C&H1:3](-*)[O:2]-* |$;;;star_e;;star_e;;;;star_e;;star_e$,SgD:1,0:foo:bar::::,,SgD:7,6:foo:baz::::,,,Sg:n:4,2,1,0::ht:::,,Sg:n:10,8,7,6::ht:::,SgH:3:1.1|

If this seems like a reasonable approach, I can try to extend this to work for SMILES as well.

Copy link
Contributor

@ricrogz ricrogz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach makes sense.

Comment on lines +99 to +100
std::vector<unsigned int> prevAtomOrdering;
std::vector<unsigned int> prevBondOrdering;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicated, it's declared again inside the loop.


// ignore atom properties -- really this is to avoid the atom map numbers, maybe we just want to remove
// those
auto flags = RDKit::SmilesWrite::CXSmilesFields::CX_ATOM_PROPS ^ RDKit::SmilesWrite::CXSmilesFields::CX_ALL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you doing this to remove the RGroup labels from the output?

@greglandrum
Copy link
Member

My two cents on the approach: it seems like it would be more robust and efficient (because you won't have to create the new object and copy all the atoms and bonds) to have a new getCXExtensions() which takes a vector of mol pointers (and to modify the current getCXExtensions() to call that one) and to modify each of the functions that calls to do the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants