Skip to content

Performance Evaluation

Greg Landrum edited this page Apr 3, 2019 · 3 revisions

Background

The idea here is to track the evolution of the RDKit's performance from release to release in order to be able to highlight notable changes and make sure there are no serious regressions.

When I run the tests, I try not to do anything else computationally/memory intensive, but I don't run the tests multiple times and average, so don't ascribe too much significance to small time differences.

The tests themselves are described at the bottom of this page.

Results

03.04.2019

Test machine: Otter (Dell XPS workstation, i7-4790 @3.6GHz, 16GB RAM, Ubuntu 18.04, python 3.6)

Short tests

Version t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16 t17 t18
2018.03.4 0.4 0.1 0.0 0.7 0.7 0.0 2.0 2.0 0.1 0.2 0.1 3.2 32.6 0.6 1.1 0.8 0.5 0.0
2018.09.3 0.3 0.2 0.0 0.5 0.5 0.0 1.7 1.7 0.1 0.2 0.1 3.5 32.6 0.6 1.1 0.8 0.5 0.0
2019.03.1b1 0.3 0.2 0.0 0.5 0.6 0.0 1.8 1.8 0.1 0.2 0.1 3.5 26.8 0.6 1.2 0.8 0.5 0.0

Long tests

Version t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14
2018.03.4 7.8 3.6 3.0 0.0 39.7 43.3 0.0 114.4 113.0 16.7 27.5 10.4 38.7 2.2
2018.09.3 7.9 3.7 3.1 0.0 31.6 34.7 0.0 98.4 100.6 17.2 26.3 10.2 41.2 2.4
2019.03.1b1 8.0 3.7 3.5 0.0 31.6 34.7 0.0 102.1 105.6 18.4 27.2 11.2 40.9 2.4

Description of the tests

The short tests

The tests I've put together cover a broad range of functionality and come in two versions. The base script (https://github.com/rdkit/rdkit/blob/master/Regress/Scripts/timings.py) includes:

  1. Constructing 1000 molecules from SDF
  2. Constructing 1000 molecules from SMILES
  3. Constructing 1823 fragment queries from SMARTS
  4. Calling HasSubstructMatch() for the 823 queries across the 1000 molecules
  5. Calling GetSubstructMatches() for the 823 queries across the 1000 molecules
  6. Reading 428 more complex SMARTS queries from SMARTS
  7. Calling HasSubstructMatch() for the 428 queries across the 1000 molecules
  8. Calling GetSubstructMatches() for the 428 queries across the 1000 molecules
  9. Generating canonical SMILES for the 1000 molecules
  10. Generating 2D coords for the 1000 molecules
  11. Generating mol blocks for the 1000 molecules
  12. Doing a RECAP decomposition for the 1000 molecules
  13. Generating an ETKDG conformation for 50 of the molecules
  14. UFF optimizing the 50 molecules with ETKDG conformations
  15. MMFF optimize the 50 molecules with ETKDG conformations
  16. Find unique subgraphs of size 6 in the 1000 molecules
  17. Generate RDKit fingerprints for the 1000 molecules
  18. Generate MFP2 fingerprints for the 1000 molecules

The long tests

The longer-running script (https://github.com/rdkit/rdkit/blob/master/Regress/Scripts/new_timings.py) includes:

  1. Creating 50K molecules (taken from the old Zinc Natural Products set) from SMILES
  2. Creating canonical SMILES for the 50K molecules
  3. Create 10K molecules from SDF
  4. Constructing 1823 fragment queries from SMARTS
  5. Calling HasSubstructMatch() for the 823 queries across the 50K molecules
  6. Calling GetSubstructMatches() for the 823 queries across the 50K molecules
  7. Reading 428 more complex SMARTS queries from SMARTS
  8. Calling HasSubstructMatch() for the 428 queries across the 50K molecules
  9. Calling GetSubstructMatches() for the 428 queries across the 50K molecules
  10. Generating mol blocks for the 50K molecules
  11. Doing a BRICS decomposition for the 50K molecules
  12. Generating 2D coords for the 50K molecules
  13. Generate RDKit fingerprints for the 50K molecules
  14. Generate MFP2 fingerprints for the 50K molecules