Why doesn't the rdkit MFP implementation set radius=2 bits for D1 atoms #7175
-
I was looking at the bit info map for the rdkit morgan fingerprints and I noticed that D1 atoms (the smarts definition) never set bits for radius = 2. Is this a feature of the original implementation of morgan fingerprints? Maybe I'm misunderstanding something about how the algorithm works, but it seems like this would be throwing out useful information if it's intentional relevant code:
gives the output:
if you look at the mol with atom indexes, you can see that the two D1 atoms (0 and 6) do not set bits for radius=2 or alternatively you can see that two of the atoms only set two bits: another example:
rdkit version: 2023.03.3 |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi, |
Beta Was this translation helpful? Give feedback.
Hi,
this is because redundant environments (same environment, but higher radius or same radius but higher invariant) are removed. You can see the atom 0 radius 2 fragment is already covered by atom 1 radius 1 morgan environment. likewise, atom 6 radius 2 fragment is already covered by atom 5 radius 1 fragment. This is also discussed in the rogers and hahn ecfp paper the rdkit implementation is based on. There's also a "includeRedundantEnvironments" flag somewhere you can use in case you need this information for your use case.