Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New DrugComb data #191

Open
TangYiChing opened this issue Jan 6, 2023 · 4 comments
Open

New DrugComb data #191

TangYiChing opened this issue Jan 6, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@TangYiChing
Copy link

Describe the problem
The DrugComb database has released new drug combination and monotherapy screening datasets, which includes cancer, malaria, and COVID-19.
Reference: [https://doi.org/10.1093/nar/gkab438]

Describe the solution you'd like
Replace current TDC/data/drugcomb.pkl with the new file at (https://drugcomb.org/download/), and add new columns ['Study name', 'Disease'] to distinguish cancer, malaria, or COVID-19.

Additional context
N/A.

@kexinhuang12345
Copy link
Collaborator

Thank you! It would be a great idea! Would you like to make a PR for it?

@kexinhuang12345 kexinhuang12345 added the enhancement New feature or request label Jan 9, 2023
@TangYiChing
Copy link
Author

Thank you! It would be a great idea! Would you like to make a PR for it?

DrubComb provides API for quick access to both drug and cell line information. They already have SMILE strings and cell line ids. In terms of adding a new drug-drug-cell line triplet to the current TDC dataset, what needs to be added now is the gene expression values from the CallMiner database. What would you like me to do to facilitate the process?

@kexinhuang12345
Copy link
Collaborator

Thank you! Is the gene expression values available only in CallMiner? I saw in the paper they can retrieve them through public databases such as DepMap, Cell Model Passports, etc. https://academic.oup.com/view-large/figure/267020980/gkab438fig1.jpg

@TangYiChing
Copy link
Author

Thank you! Is the gene expression values available only in CallMiner? I saw in the paper they can retrieve them through public databases such as DepMap, Cell Model Passports, etc. https://academic.oup.com/view-large/figure/267020980/gkab438fig1.jpg

Yes, these are commonly used sources nowadays, and they are all RNA-seq data now (i.e., expression values are TPM). We might need a new workflow for data processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants