-
Notifications
You must be signed in to change notification settings - Fork 845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor smiles parsing code to use C++ lexer and parser (#7015) #7016
Open
whosayn
wants to merge
8
commits into
rdkit:master
Choose a base branch
from
whosayn:smilesParserRefactor
base: master
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Commits on Jan 1, 2024
-
Refactor smiles parsing code to use C++ lexer and parser (rdkit#7015)
This refactors the smiles parsing procedure to separate the parsing and ROMol construction procedures. Given the intricate nature of the mol construction procedure, a 1D list of mol events made the most sense as the parsers output. This allowed me to ensure that the order in which atoms/bonds are created and properties are set follows that from the previous implementation. I also added a new project to External/flex (source: https://github.com/westes/flex) because C++ scanners generated by flex require the FlexLexer.h header. We need this to be present whether users have flex installed or not, and this was the easiest way to achieve that. Summary of changes: * Updated parsing-related error messages to point to the bad token and to include more informative messages. Eg. ` [16:26:38] SMILES Parse Error: check for mistakes around position 13 COc(c1)cccc1C# -------------^ syntax error ` ` [16:27:32] SMILES Parse Error: check for mistakes around position 1 [Bg] -^ unsupported atom symbol ` * Removed manual memory management from the ROMol construction procedure by using RAII classes (this doesn't apply to ing closures) * This also removes bad interactions between consecutive smiles parsing, which required conversion of bad inputs to follow a reset global state. See refactored SmilesParse::test.cpp::testFail * Fixes bugs like: * preventing hydrogens with defined chiralities * allowing branch atoms of the form `C1(.C1)` * allowing ring bonds like `C1.C%01` * restricting formal charges to -15 <= N <= 15 An important concern for this change is performance, so I ran all of the tests in SmilesParse::test.cpp x1000 and noticed that this change increases the runtime by about 5% i.e. from about 82ms to 86ms.
Configuration menu - View commit details
-
Copy full SHA for 168148b - Browse repository at this point
Copy the full SHA 168148bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5c951a9 - Browse repository at this point
Copy the full SHA 5c951a9View commit details -
Configuration menu - View commit details
-
Copy full SHA for f2b48b0 - Browse repository at this point
Copy the full SHA f2b48b0View commit details
Commits on Jan 3, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 4d623e4 - Browse repository at this point
Copy the full SHA 4d623e4View commit details -
Configuration menu - View commit details
-
Copy full SHA for efad133 - Browse repository at this point
Copy the full SHA efad133View commit details -
Configuration menu - View commit details
-
Copy full SHA for a92384c - Browse repository at this point
Copy the full SHA a92384cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4128316 - Browse repository at this point
Copy the full SHA 4128316View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1b0f143 - Browse repository at this point
Copy the full SHA 1b0f143View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.