Organic compound aqueous solubility prediction using the Delaney (ESOL) dataset

This is an attempt to model the Delany aqueous solubility dataset published in 2004 with contemporary ML approaches using Pycaret package and utilising ensemble stacking of top 3 best performing models.

Two approaches are taken to model the dataset

Using all the descriptors as initially proposed in the Delaney's publication
Using VSA and other descriptors as proposed in other publications

Installation

git clone https://github.com/aretasg/SolPred.git
cd sol_pred
conda env create -f environment.yml
conda activate solpred

Validation (Unseen data) Set Metrics

Model	MAE	RMSE	R2
1	0.4572	0.4537	0.8886
2	0.4558	0.4587	0.8874

Both methods performed evidently better then the orignal ESOL and comparably between each other with the former (2) having a slightly better residual plot with less outliers
For all metrics and other please refer to the end of each jupyter notebook

Availability

Both methods are distributed as .pkl files and an example script to run them

References

Delaney, 2004
Avdeef, 2020

License

MIT license

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_processing.ipynb		data_processing.ipynb
data_processing_vsa.ipynb		data_processing_vsa.ipynb
delaney.csv		delaney.csv
delaney_descriptors.csv		delaney_descriptors.csv
delaney_labeled.csv		delaney_labeled.csv
dls-100.csv		dls-100.csv
environment.yml		environment.yml
prediction.ipynb		prediction.ipynb
prediction_vsa.ipynb		prediction_vsa.ipynb
run_model.py		run_model.py
sol_pred_model_delaney.pkl		sol_pred_model_delaney.pkl
sol_pred_model_vsa.pkl		sol_pred_model_vsa.pkl
validating_with_dls-100.ipynb		validating_with_dls-100.ipynb
vsa_descriptors.csv		vsa_descriptors.csv

License

aretasg/SolPred

Folders and files

Latest commit

History

Repository files navigation

Organic compound aqueous solubility prediction using the Delaney (ESOL) dataset

Installation

Validation (Unseen data) Set Metrics

Availability

References

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages