Refactor smiles parsing code to use C++ lexer and parser (#7015) #7016

This refactors the smiles parsing procedure to separate the parsing and ROMol construction procedures. Given the intricate nature of the mol construction procedure, a 1D list of mol events made the most sense as the parsers output. This allowed me to ensure that the order in which atoms/bonds are created and properties are set follows that from the previous implementation. I also added a new project to External/flex (source: https://github.com/westes/flex) because C++ scanners generated by flex require the FlexLexer.h header. We need this to be present whether users have flex installed or not, and this was the easiest way to achieve that. Summary of changes: * Updated parsing-related error messages to point to the bad token and to include more informative messages. Eg. ` [16:26:38] SMILES Parse Error: check for mistakes around position 13 COc(c1)cccc1C# -------------^ syntax error ` ` [16:27:32] SMILES Parse Error: check for mistakes around position 1 [Bg] -^ unsupported atom symbol ` * Removed manual memory management from the ROMol construction procedure by using RAII classes (this doesn't apply to ing closures) * This also removes bad interactions between consecutive smiles parsing, which required conversion of bad inputs to follow a reset global state. See refactored SmilesParse::test.cpp::testFail * Fixes bugs like: * preventing hydrogens with defined chiralities * allowing branch atoms of the form `C1(.C1)` * allowing ring bonds like `C1.C%01` * restricting formal charges to -15 <= N <= 15 An important concern for this change is performance, so I ran all of the tests in SmilesParse::test.cpp x1000 and noticed that this change increases the runtime by about 5% i.e. from about 82ms to 86ms.

…esParse

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor smiles parsing code to use C++ lexer and parser (#7015) #7016

Refactor smiles parsing code to use C++ lexer and parser (#7015) #7016

Commits on Jan 1, 2024

Commits on Jan 3, 2024

Refactor smiles parsing code to use C++ lexer and parser (#7015) #7016

Are you sure you want to change the base?

Refactor smiles parsing code to use C++ lexer and parser (#7015) #7016

Commits on Jan 1, 2024

Commits on Jan 3, 2024