Performance Evaluation

Background

The idea here is to track the evolution of the RDKit's performance from release to release in order to be able to highlight notable changes and make sure there are no serious regressions.

When I run the tests, I try not to do anything else computationally/memory intensive, but I don't run the tests multiple times and average, so don't ascribe too much significance to small time differences.

The tests themselves are described at the bottom of this page.

Results

03.04.2019

Test machine: Otter (Dell XPS workstation, i7-4790 @3.6GHz, 16GB RAM, Ubuntu 18.04, python 3.6)

Short tests

Version	t1	t2	t4	t5	t7	t8	t9	t10	t11	t12	t13	t14	t15	t16	t17
2018.03.4	0.4	0.1	0.7	0.7	2.0	2.0	0.1	0.2	0.1	3.2	32.6	0.6	1.1	0.8	0.5
2018.09.3	0.3	0.2	0.5	0.5	1.7	1.7	0.1	0.2	0.1	3.5	32.6	0.6	1.1	0.8	0.5
2019.03.1b1	0.3	0.2	0.5	0.6	1.8	1.8	0.1	0.2	0.1	3.5	26.8	0.6	1.2	0.8	0.5

Long tests

Version	t1	t2	t3	t5	t6	t8	t9	t10	t11	t12	t13	t14
2018.03.4	7.8	3.6	3.0	39.7	43.3	114.4	113.0	16.7	27.5	10.4	38.7	2.2
2018.09.3	7.9	3.7	3.1	31.6	34.7	98.4	100.6	17.2	26.3	10.2	41.2	2.4
2019.03.1b1	8.0	3.7	3.5	31.6	34.7	102.1	105.6	18.4	27.2	11.2	40.9	2.4

Description of the tests

The short tests

The tests I've put together cover a broad range of functionality and come in two versions. The base script (https://github.com/rdkit/rdkit/blob/master/Regress/Scripts/timings.py) includes:

Constructing 1000 molecules from SDF
Constructing 1000 molecules from SMILES
Constructing 1823 fragment queries from SMARTS
Calling HasSubstructMatch() for the 823 queries across the 1000 molecules
Calling GetSubstructMatches() for the 823 queries across the 1000 molecules
Reading 428 more complex SMARTS queries from SMARTS
Calling HasSubstructMatch() for the 428 queries across the 1000 molecules
Calling GetSubstructMatches() for the 428 queries across the 1000 molecules
Generating canonical SMILES for the 1000 molecules
Generating 2D coords for the 1000 molecules
Generating mol blocks for the 1000 molecules
Doing a RECAP decomposition for the 1000 molecules
Generating an ETKDG conformation for 50 of the molecules
UFF optimizing the 50 molecules with ETKDG conformations
MMFF optimize the 50 molecules with ETKDG conformations
Find unique subgraphs of size 6 in the 1000 molecules
Generate RDKit fingerprints for the 1000 molecules
Generate MFP2 fingerprints for the 1000 molecules

The long tests

The longer-running script (https://github.com/rdkit/rdkit/blob/master/Regress/Scripts/new_timings.py) includes:

Creating 50K molecules (taken from the old Zinc Natural Products set) from SMILES
Creating canonical SMILES for the 50K molecules
Create 10K molecules from SDF
Constructing 1823 fragment queries from SMARTS
Calling HasSubstructMatch() for the 823 queries across the 50K molecules
Calling GetSubstructMatches() for the 823 queries across the 50K molecules
Reading 428 more complex SMARTS queries from SMARTS
Calling HasSubstructMatch() for the 428 queries across the 50K molecules
Calling GetSubstructMatches() for the 428 queries across the 50K molecules
Generating mol blocks for the 50K molecules
Doing a BRICS decomposition for the 50K molecules
Generating 2D coords for the 50K molecules
Generate RDKit fingerprints for the 50K molecules
Generate MFP2 fingerprints for the 50K molecules

Provide feedback

Saved searches

Use saved searches to filter your results more quickly