BBB permeability - New dataset #174

devanshamin · 2022-09-25T21:48:04Z

Describe the problem
Currently, TDC has BBB_martins dataset for Blood Brain Barrier (BBB) permeability consisting of only 2030 compounds. There is a much bigger dataset called Blood-Brain Barrier Database (B3DB) consisting of 7807 compounds.

Describe the solution you'd like
Inclusion of the dataset to the Single-instance Prediction Problem (ADME) and the ADMET Benchmark Group.

from tdc.single_pred import ADME
data = ADME(name="B3DB")

Additional context
B3DB - https://github.com/theochem/B3DB

kexinhuang12345 · 2022-09-28T07:00:00Z

Hi Devansh! Thanks for the pointer! This definitely sounds relevant! Would you like to contribute to TDC? Let us know, thanks!

marc-gav · 2023-05-28T09:09:24Z

I will work on this

inakineitor · 2023-12-07T03:44:08Z

@kexinhuang12345 Hi Kexin! I am interested in adding the BBB dataset to TDC. So far the steps I identified are:

~~Add a bbb.py file to the tdc/single_pred folder.~~ I realized that BBB belongs to ADME so no file changes in this folder.
~~Add the appropriate reexport to tdc/single_pred/__init__.py.~~ For same reason this step is not necessary.
Download the data and give it to you for storing in Dataverse.
Inserting in line 119 of tdc/metadata.py the names for the classification and the regression versions of the B3DB dataset.

adme_dataset_names = [
    # ...
    "clearance_microsome_az",
    "b3db_classification", # Added
    "b3db_regression", # Added
]

Add to the object in line 627:

name2type = {
    # ...
    "bbb_adenot": "tab",
    "b3db_classification": "tab", # Added
    "b3db_regression": "tab", # Added
    "bbb_martins": "tab",
    # ...
}

I am unsure of how to generate the id to put in name2id in line 740. Does one obtain that by adding the dataset to the data server?
Same question, but for name2stats in line 907.

I am new to the package so any guidance or recommendations would be appreciated.

Looking forward to your response!

ayushnoori · 2023-12-12T16:16:15Z

Hi @kexinhuang12345, we had a conversation back in February 2022 about adding this dataset to TDC so following up here. I'm working with @inakineitor and we would be happy to help get this dataset included (unless @marc-gav has made progress). We can also open a new issue if needed.

Iñaki – Kexin had previously pointed me to the contribution guide.

kexinhuang12345 · 2023-12-19T06:08:22Z

Sorry for the late reply - was traveling - this sounds awesome! I think the questions can be answered via the contribution guide pointed out by Ayush. Let me know if you still bump into any questions!

ayushnoori · 2023-12-20T11:01:44Z

Hi Kexin, no worries! All steps are now completed except for name2stats, described as a "mapping from dataset names to statistics." How should the statistics IDs be generated?

ayushnoori · 2023-12-20T11:06:28Z

Please see ayushnoori@ac35e01 at my fork, https://github.com/ayushnoori/TDC.

kexinhuang12345 added good first issue Good for newcomers new-dataset Request new dataset. help-wanted labels Nov 9, 2022

ayushnoori mentioned this issue Dec 20, 2023

Add Blood-Brain Barrier Database (B3DB) to TDC #215

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BBB permeability - New dataset #174

BBB permeability - New dataset #174

devanshamin commented Sep 25, 2022

kexinhuang12345 commented Sep 28, 2022

marc-gav commented May 28, 2023

inakineitor commented Dec 7, 2023 •

edited

ayushnoori commented Dec 12, 2023

kexinhuang12345 commented Dec 19, 2023

ayushnoori commented Dec 20, 2023

ayushnoori commented Dec 20, 2023 •

edited

BBB permeability - New dataset #174

BBB permeability - New dataset #174

Comments

devanshamin commented Sep 25, 2022

kexinhuang12345 commented Sep 28, 2022

marc-gav commented May 28, 2023

inakineitor commented Dec 7, 2023 • edited

ayushnoori commented Dec 12, 2023

kexinhuang12345 commented Dec 19, 2023

ayushnoori commented Dec 20, 2023

ayushnoori commented Dec 20, 2023 • edited

inakineitor commented Dec 7, 2023 •

edited

ayushnoori commented Dec 20, 2023 •

edited