Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDKit Rooted Fingerprints Have Wrong Results in KNIME #143

Open
manuelschwarze opened this issue Dec 1, 2023 · 0 comments
Open

RDKit Rooted Fingerprints Have Wrong Results in KNIME #143

manuelschwarze opened this issue Dec 1, 2023 · 0 comments
Assignees
Labels

Comments

@manuelschwarze
Copy link
Contributor

We have discovered that rooted fingerprints in KNIME have are different from Python and pure Java. The reason we found is that there is a bug how the atom list for the rooting feature is created before it passed to the RDKit native code.

Paolo Tosco's Java code created it like this, as example with atom index 17:

UInt_Vect atomList = new UInt_Vect(1);
atomList.set(0, atom);

This is correct. It results in [ 17 ]

I had created it many years ago like this:

atomList = new UInt_Vect(1);
atomList.add(atom);

This is incorrect. It results in [ 0, 17 ], where "0" could also be something that is not well-defined.

In addition to this code, other buggy code also related to the same feature was discovered in InputDataObject class, which processes again such UInt_Vect objects and has the same bug.

Consequences of this bug:

InputDataInfo.getRDKitIntegerVector(DataRow row) – Always was double the size that it should be, and the first half was filled with 0 (or undefined).

RDKit Highlighting Node – Should have been leading to always coloring of atom at index 0, but for some reason this did not happen, hence no problem that a user would actually see
RDKit Highlighting Atoms Node (deprecated) – Influence on SVG creation, but no problem that a user would actually see
InputDataInfo.getRDKitUIntegerVector (DataRow row) – Always was double the size that it should be, and the first half was filled with 0 (or undefined).

RDKit Fingerprint Nodes – Rooted fingerprints (Morgan, FeatMorgan, AtomPair, Torsion, RDKit, Layered, always included the atom indexes 0 (or undefined)
AbstractFingerprintNodeModel.$AbstractRDKitCellFactory.process(…) – When a single atom was selected for rooting, it was always using atom at index 0 and the selected one, so it used 2 atoms instead of a single one, every fingerprint calculated with rooted atoms has in the moment wrong results: There are more bits set in the fingerprint than it would have without the additional badly added atom index.

Tracked at Novartis as KNIME-1794

@manuelschwarze manuelschwarze self-assigned this Dec 1, 2023
manuelschwarze added a commit that referenced this issue Dec 2, 2023
- KNIME-1792: RDKit Binaries updated to 2023_09_2 and adapted tests -
changes were detected related to:
   - Optimize geometry
   - RDKit rooted fingerprints
   - SVG generation
   - Substructure counter stereo-chemistry handling
- KNIME-1794: Fixed rooted fingerprints (in KNIME code)
- Update version from 4.9.0 => 4.9.1 for RDKit Binary Changes
- Added for KNIME 4.6 testing javax.annotation to wizard: Some plugins
that the RDKit Nodes Wizard requires have a dependency on
javax.annotation, which was fulfilled in lower Java versions
automatically, but not anymore when using Java 17. Therefore, it is now
referenced additionally.
greglandrum pushed a commit that referenced this issue Dec 3, 2023
… to master (#141)

* KNIME-1725: Version 4.7.1 => 4.9.0 for master (KNIME 4.7+)

This version will contain changes (with subsequent commits) for:
- Advanced port handling for RDKit nodes
- KNIME File Handling Framework integration
- RDKit Component Reaction Nodes improvements

Commits will be cherry-picked from pre-nightly4.7 branch

* KNIME-1309: Support all port objects, not only data tables (#133)

* KNIME-1309 Added: support of dynamic ports and arbitrary port types
* Dependency fix for RDKit Nodes Wizard
---------
Co-authored-by: Roman Balabanov <roman-1.balabanov_ext@novartis.com>

* KNIME-1705 FunctionalGroupFilter node: migration to KNIME NIO (#135)

Authored-by: Roman Balabanov <roman-1.balabanov_ext@novartis.com>

* KNIME-1706 StructureNormalizer node: migration to KNIME NIO (#136)

Authored-by: Roman Balabanov <roman-1.balabanov_ext@novartis.com>

* KNIME-1704 FingerprintReader FingerprintWriter nodes: migration to KNIME NIO (#137)

Authored-by: Roman Balabanov <roman-1.balabanov_ext@novartis.com>

* KNIME-1762 Fixed: StructureNormalizer node Options tab layout is bad on Mac (#139)

Co-authored-by: Roman Balabanov <roman-1.balabanov_ext@novartis.com>

* KNIME-132: Added: handling of additional columns in ComponentReaction nodes (#140)

Authored-by: Roman Balabanov <roman-1.balabanov_ext@novartis.com>

* KNIME-1704,KNIME-1705,KNIME-1706: MANIFEST fix for file handling dep

* KNIME-1792: RDKit 2023_09_2 Binaries Update and Fixes (issue #143)

- KNIME-1792: RDKit Binaries updated to 2023_09_2 and adapted tests -
changes were detected related to:
   - Optimize geometry
   - RDKit rooted fingerprints
   - SVG generation
   - Substructure counter stereo-chemistry handling
- KNIME-1794: Fixed rooted fingerprints (in KNIME code)
- Update version from 4.9.0 => 4.9.1 for RDKit Binary Changes
- Added for KNIME 4.6 testing javax.annotation to wizard: Some plugins
that the RDKit Nodes Wizard requires have a dependency on
javax.annotation, which was fulfilled in lower Java versions
automatically, but not anymore when using Java 17. Therefore, it is now
referenced additionally.

---------

Co-authored-by: Roman Balabanov <roman-1.balabanov_ext@novartis.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant