Performance Evaluation
The idea here is to track the evolution of the RDKit's performance from release to release in order to be able to highlight notable changes and make sure there are no serious regressions.
When I run the tests, I try not to do anything else computationally/memory intensive, but I don't run the tests multiple times and average, so don't ascribe too much significance to small time differences.
The tests themselves are described at the bottom of this page.
Test machine: Otter (Dell XPS workstation, i7-4790 @3.6GHz, 16GB RAM, Ubuntu 18.04, python 3.6)
Version | t1 | t2 | t3 | t4 | t5 | t6 | t7 | t8 | t9 | t10 | t11 | t12 | t13 | t14 | t15 | t16 | t17 | t18 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2018.03.4 | 0.4 | 0.1 | 0.0 | 0.7 | 0.7 | 0.0 | 2.0 | 2.0 | 0.1 | 0.2 | 0.1 | 3.2 | 32.6 | 0.6 | 1.1 | 0.8 | 0.5 | 0.0 |
2018.09.3 | 0.3 | 0.2 | 0.0 | 0.5 | 0.5 | 0.0 | 1.7 | 1.7 | 0.1 | 0.2 | 0.1 | 3.5 | 32.6 | 0.6 | 1.1 | 0.8 | 0.5 | 0.0 |
2019.03.1b1 | 0.3 | 0.2 | 0.0 | 0.5 | 0.6 | 0.0 | 1.8 | 1.8 | 0.1 | 0.2 | 0.1 | 3.5 | 26.8 | 0.6 | 1.2 | 0.8 | 0.5 | 0.0 |
Version | t1 | t2 | t3 | t4 | t5 | t6 | t7 | t8 | t9 | t10 | t11 | t12 | t13 | t14 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2018.03.4 | 7.8 | 3.6 | 3.0 | 0.0 | 39.7 | 43.3 | 0.0 | 114.4 | 113.0 | 16.7 | 27.5 | 10.4 | 38.7 | 2.2 |
2018.09.3 | 7.9 | 3.7 | 3.1 | 0.0 | 31.6 | 34.7 | 0.0 | 98.4 | 100.6 | 17.2 | 26.3 | 10.2 | 41.2 | 2.4 |
2019.03.1b1 | 8.0 | 3.7 | 3.5 | 0.0 | 31.6 | 34.7 | 0.0 | 102.1 | 105.6 | 18.4 | 27.2 | 11.2 | 40.9 | 2.4 |
The tests I've put together cover a broad range of functionality and come in two versions. The base script (https://github.com/rdkit/rdkit/blob/master/Regress/Scripts/timings.py) includes:
- Constructing 1000 molecules from SDF
- Constructing 1000 molecules from SMILES
- Constructing 1823 fragment queries from SMARTS
- Calling
HasSubstructMatch()
for the 823 queries across the 1000 molecules - Calling
GetSubstructMatches()
for the 823 queries across the 1000 molecules - Reading 428 more complex SMARTS queries from SMARTS
- Calling
HasSubstructMatch()
for the 428 queries across the 1000 molecules - Calling
GetSubstructMatches()
for the 428 queries across the 1000 molecules - Generating canonical SMILES for the 1000 molecules
- Generating 2D coords for the 1000 molecules
- Generating mol blocks for the 1000 molecules
- Doing a RECAP decomposition for the 1000 molecules
- Generating an ETKDG conformation for 50 of the molecules
- UFF optimizing the 50 molecules with ETKDG conformations
- MMFF optimize the 50 molecules with ETKDG conformations
- Find unique subgraphs of size 6 in the 1000 molecules
- Generate RDKit fingerprints for the 1000 molecules
- Generate MFP2 fingerprints for the 1000 molecules
The longer-running script (https://github.com/rdkit/rdkit/blob/master/Regress/Scripts/new_timings.py) includes:
- Creating 50K molecules (taken from the old Zinc Natural Products set) from SMILES
- Creating canonical SMILES for the 50K molecules
- Create 10K molecules from SDF
- Constructing 1823 fragment queries from SMARTS
- Calling
HasSubstructMatch()
for the 823 queries across the 50K molecules - Calling
GetSubstructMatches()
for the 823 queries across the 50K molecules - Reading 428 more complex SMARTS queries from SMARTS
- Calling
HasSubstructMatch()
for the 428 queries across the 50K molecules - Calling
GetSubstructMatches()
for the 428 queries across the 50K molecules - Generating mol blocks for the 50K molecules
- Doing a BRICS decomposition for the 50K molecules
- Generating 2D coords for the 50K molecules
- Generate RDKit fingerprints for the 50K molecules
- Generate MFP2 fingerprints for the 50K molecules