[Docking Leaderboard DRD3] Reproducibility Issues #235

Jonas-Verhellen · 2024-03-27T17:34:08Z

Dear maintainers of the TDC project,

I'm trying to reproduce the results obtained in the DRD3 docking group benchmark for the GB-GA model. I am however having a few issues.

I cannot seem to reproduce the docking values for some individual molecules.

Some examples taken from the top docking scores reported in the User Group Meeting. I obtain different docking values using the same oracle (reported values shown first):
O=c1c2c(Br)cccc2ncn1Cc1cc(F)c(-c2n~c3c(C4=NNN=N4)cccc3o2)cc1F: -12.2 vs -11.4,
O=C1OC(=O)C23CCOCC12N=NN3CC12CC(C3=NC(c4cncc5ccccc45)=NN3)(CO1)C2: -12.2 vs -12.5,
CC(=O)C(c1cccc(C(=O)c2cccc3ccccc23)c1)c1noc(-c2c[nH]nc2C2CCCCC2)n1: 12.0 vs -10.5
While the above discrepancies are relatively minor, the issues seem to be getting larger if I look into the data connected to the leaderboard itself. For instance these SMILES, from the smiles_lstm_2_5000.txt file, have markedly different reported docking scores than the ones I obtain from the oracle:
O=C(CCOc1ccccc1F)Oc1ccccc1C(=O)CCCCc1ccc(C(F)(F)F)cc1: -15 vs -9.2
O=C(CCOc1ccccc1)Oc1ccccc1C(=O)CCCc1ccccc1F: -15 vs -9.2
O=C(CCOc1ccccc1F)Oc1ccccc1C(=O)CCCOc1ccccc1C(F)(F)F: -15.0 vs -10.3
O=C(Nc1ccccc1F)Oc1ccccc1C(=O)CCc1ccccc1C(F)(F)F: -14.6 vs -9.0
O=C(CCOc1ccccc1F)Oc1ccccc1C(=O)CCCOc1ccccc1Cl: -14.5 vs -9.1
O=C(CCOc1ccccc1F)Oc1ccccc1C(=O)CCCOc1ccccc1F: -14.4 vs -8.9
O=C(CCOc1ccccc1F)Oc1ccccc1C(=O)CCCOCc1ccccc1: -14.4 vs -9.0
Are there some settings that have changed/need to be specified somewhere in the code? Am I missing something? Is there a change in the back-end?

Unfortunately, I can also not locate all the pickle files for the currently claimed performance in the benchmark. The github repo linked to the benchmark is missing the majority of these files. I have noticed the website does have a visualization of the molecules. Is it possible to find (or publicly release) all the molecules in a SMILES format with their docking scores as submitted they were to the benchmark?
It is not entirely clear to me which dateset is used to seed the algorithms. Is it Zinc 250k or guacamol_v1_all.smiles?

Kind regards,
Jonas

amva13 · 2024-04-03T17:29:15Z

Dear Marinka and Maintainers of the TDC Project,

I hope this email finds you well. I am reaching out to you regarding some issues I have encountered while attempting to reproduce the results obtained in the DRD3 docking group benchmark. As I hope to utilize your benchmark as the conclusion of an upcoming paper introducing a novel and significantly more effective generative model, I am keen to resolve these issues.

More specifically, if I look at the best performing model in the benchmark (GB-GA), I am having trouble locating the files for the current performance in the benchmark. The GitHub repository linked to the benchmark appears to be missing the majority of these files. Is it possible to obtain or publicly release all the molecules in SMILES format along with their corresponding docking scores as they were submitted to the benchmark?

In addition, I have encountered discrepancies in the docking values for several individual molecules when compared to the values reported. Some examples: For instance these SMILES, from the smiles_lstm_2_5000.txt file, have markedly different reported docking scores than the ones I currently obtain from the oracle (installed according to the instructions on the TDC website):

O=C(CCOc1ccccc1F)Oc1ccccc1C(=O)CCCCc1ccc(C(F)(F)F)cc1: -15 vs -9.2
O=C(CCOc1ccccc1)Oc1ccccc1C(=O)CCCc1ccccc1F: -15 vs -9.2
O=C(CCOc1ccccc1F)Oc1ccccc1C(=O)CCCOc1ccccc1C(F)(F)F: -15.0 vs -10.3
O=C(Nc1ccccc1F)Oc1ccccc1C(=O)CCc1ccccc1C(F)(F)F: -14.6 vs -9.0
O=C(CCOc1ccccc1F)Oc1ccccc1C(=O)CCCOc1ccccc1Cl: -14.5 vs -9.1
O=C(CCOc1ccccc1F)Oc1ccccc1C(=O)CCCOc1ccccc1F: -14.4 vs -8.9
O=C(CCOc1ccccc1F)Oc1ccccc1C(=O)CCCOCc1ccccc1: -14.4 vs -9.0

I am uncertain whether these discrepancies stem from specific settings, something simple I've missed, or a change in the backend. Would it be possible to please provide any clarification on this matter?

Thank you in advance.

amva13 · 2024-04-03T17:29:38Z

@Jonas-Verhellen will have a look

amva13 · 2024-04-05T18:16:24Z

@kexinhuang12345

amva13 · 2024-04-05T18:18:49Z

@futianfan @wenhao-gao are you able to help with this?

amva13 · 2024-04-10T19:46:37Z

Hi @Jonas-Verhellen , what version of scikit are you using? What version of TDC? I'm fairly sure the cause is the same as this issue. Checking how to resolve.

#244

Jonas-Verhellen · 2024-04-12T13:56:53Z

Hi @amva13, thanks for looking into this! I am using scikit-learn 1.3.0 and pytdc 0.4.1 with python 3.10.12. Let me know if you need any more information.

amva13 self-assigned this Apr 10, 2024

amva13 added bug Something isn't working v2neurips labels May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docking Leaderboard DRD3] Reproducibility Issues #235

[Docking Leaderboard DRD3] Reproducibility Issues #235

Jonas-Verhellen commented Mar 27, 2024

amva13 commented Apr 3, 2024

amva13 commented Apr 3, 2024

amva13 commented Apr 5, 2024

amva13 commented Apr 5, 2024

amva13 commented Apr 10, 2024

Jonas-Verhellen commented Apr 12, 2024

[Docking Leaderboard DRD3] Reproducibility Issues #235

[Docking Leaderboard DRD3] Reproducibility Issues #235

Comments

Jonas-Verhellen commented Mar 27, 2024

amva13 commented Apr 3, 2024

amva13 commented Apr 3, 2024

amva13 commented Apr 5, 2024

amva13 commented Apr 5, 2024

amva13 commented Apr 10, 2024

Jonas-Verhellen commented Apr 12, 2024