Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional fingerprints #66

Open
31 of 46 tasks
j-adamczyk opened this issue Dec 14, 2023 · 0 comments
Open
31 of 46 tasks

Add additional fingerprints #66

j-adamczyk opened this issue Dec 14, 2023 · 0 comments
Labels
feature New feature to implement

Comments

@j-adamczyk
Copy link
Collaborator

j-adamczyk commented Dec 14, 2023

We need to implement additional fingerprints from multiple sources, as we want scikit-fingerprints to be the only required Python library for computing molecular fingerprints.

RDKit hashed fingerprints:

  • Avalon - ref
  • E-state - ref
  • FPCP - ref
  • MHFP - potentially, benchmark with already implemented one, ref
  • Layered - ref
  • Pattern - ref
  • Pharmacophore - ref
  • Physiochemical property fingerprints - ref
  • RDKit (substructure) - ref
  • SECFP - ref

RDKit descriptor fingerprints:

Check other libraries and software for fingerprints and descriptors, and add them to lists below:

Other descriptor-based fingerprints:

  • Atom triplets - ref 1; this is very similar to atom pairs, just uses triplets of atoms, but we need to implement this from scratch, based on RDKit implementation (including atom invariants)
  • Electroshape descriptors - ref 1, ref 2
  • CATS descriptors - ref 1, ref 2, ref 3, ref 4; the last one includes an interesting way to use 3D distances
  • CheckMol (FP3) - ref 1, ref 2, ref 3, ref 4; rejected, since it's basically covered by Laggner's SMARTS patterns
  • Laggner - ref 1, ref 2, SMARTS Patterns for Functional Group Classification by Christian Laggner; also known as CDK Substructure Fingerprint in ref 3
  • Ghose-Crippen - ref 1, take just the SMARTS patterns
  • Klekota-Roth - ref 1, ref 2
  • Lingo - ref
  • Mordred - ref 1, ref 2
  • PubChem - ref 1, ref 2; note that we need to implement the actual calculation of this fingerprint, not just connect to PUG REST API, since it's extremely unreliable
  • SHED - ref 1, ref 2

Other fingerprints:

  • 4PT - ref 1
  • BCL2D - ref 1; rejected, since it's basically worse ECFP4
  • FragFP - ref 1, ref 2; rejected, since it uses custom search patterns engine and it's not really doable to translate them to SMARTS
  • Graph signature - ref 1, ref 2; originally proposed in ref 3 and ref 4
  • MDFP - ref
  • MolPrint2D - ref; rejected, since it's basically worse ECFP4
  • NC-MFP - ref, RDKit code is provided, but it requires quite a bit of refactoring
  • Spectrophore - ref
  • Toxicophore - ref 1, ref 2, SMARTS queries are in supporting materials; note that this is not a unique definition, and e.g. OChem ToxAlerts offers (much larger) alternative
@my-alaska my-alaska added the feature New feature to implement label Feb 26, 2024
@scikit-fingerprints scikit-fingerprints deleted a comment from my-alaska Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature to implement
Projects
None yet
Development

No branches or pull requests

2 participants