hyperparameter tuning and cross validation #729
Replies: 1 comment
-
I've converted this from an issue to a discussion since it's a broader question about how to do a machine learning task rather than a specific question about using chemprop. What you've described is one way to do it. Typically, one does cross validation to understand the effect of different dataset sampling on the quality of the model (with fixed model architecture and training parameters) and ensembling to understand the effects of different model initializations (or sometimes even different model architectures). However, you can also add hyperparameter optimization to CV if you want. There are many ways of doing CV, and the method that is most suitable for you depends on what you're trying to measure with it, how you're splitting the data, and what your computational budget is. In any case, you're correct that you should fully withhold the test data from hyperparameter optimization to avoid data leakage. Note that if you have a small dataset, you will likely have better luck using a simpler model than chemprop rather than trying to optimize chemprop's performance. |
Beta Was this translation helpful? Give feedback.
-
Hello,
I have a small dataset and I would like to build a machine learning model to predict the accumulation of drugs in bacteria. I would like to try both scaffold and cross validation splitting. To improve the performance of the model, I would like to do hyperparameter tuning.From chemprop tutorial, I learned I need to provide the training and validation dataset to hyperparameter command, but not the test data. So, I think my best bet is to split the data beforehand and do the hyperparameter optimization for each fold and then take average for the obtained hyperparameters. Is it the way it works?
If you have any suggestions to improve the performance of the model, I will greatly appreciate it.
Thank you
Best Regards
Soodabeh
Beta Was this translation helpful? Give feedback.
All reactions