hyperparameter tuning and cross validation #729

SoodabehGhaffari · 2024-03-08T16:05:02Z

SoodabehGhaffari
Mar 8, 2024

Hello,
I have a small dataset and I would like to build a machine learning model to predict the accumulation of drugs in bacteria. I would like to try both scaffold and cross validation splitting. To improve the performance of the model, I would like to do hyperparameter tuning.From chemprop tutorial, I learned I need to provide the training and validation dataset to hyperparameter command, but not the test data. So, I think my best bet is to split the data beforehand and do the hyperparameter optimization for each fold and then take average for the obtained hyperparameters. Is it the way it works?

If you have any suggestions to improve the performance of the model, I will greatly appreciate it.

Thank you

Best Regards
Soodabeh

kevingreenman · 2024-03-14T13:31:21Z

kevingreenman
Mar 14, 2024
Maintainer

I've converted this from an issue to a discussion since it's a broader question about how to do a machine learning task rather than a specific question about using chemprop.

What you've described is one way to do it. Typically, one does cross validation to understand the effect of different dataset sampling on the quality of the model (with fixed model architecture and training parameters) and ensembling to understand the effects of different model initializations (or sometimes even different model architectures). However, you can also add hyperparameter optimization to CV if you want. There are many ways of doing CV, and the method that is most suitable for you depends on what you're trying to measure with it, how you're splitting the data, and what your computational budget is. In any case, you're correct that you should fully withhold the test data from hyperparameter optimization to avoid data leakage.

Note that if you have a small dataset, you will likely have better luck using a simpler model than chemprop rather than trying to optimize chemprop's performance.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hyperparameter tuning and cross validation #729

{{title}}

Replies: 1 comment

{{title}}

Select a reply

hyperparameter tuning and cross validation #729

SoodabehGhaffari Mar 8, 2024

Replies: 1 comment

kevingreenman Mar 14, 2024 Maintainer

SoodabehGhaffari
Mar 8, 2024

kevingreenman
Mar 14, 2024
Maintainer