Training scoring functions and updated version of PDBBind #167

Tonylac77 · 2023-02-09T08:41:21Z

I am currently trying to train the NNScore and PLECScore models for ligand scoring. So far I have not found a way to train the model "purposefully" and have resorted to run scorer.load() without any arguments, which starts the training of the scoring function. However, I don't know which version of PDBBind this is using as a result (I assume v2016?).

I have tried the following for example :

scorer = NNScore.nnscore() scorer.gen_training_data(pdbbindir=$PATH$, pdbbind_versions=2016)

where path is just a directory on my machine and get the following error

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[10], line 23
     21 rescorers = {'nnscore':NNScore.nnscore()}
     22 scorer = rescorers['nnscore']
---> 23 scorer.gen_training_data(pdbbind_dir='/home/tony/CADD22/software/pdbbind', pdbbind_versions=[2016], use_proteins=False)

File ~/.conda/envs/wocondock/lib/python3.8/site-packages/oddt/scoring/functions/NNScore.py:63, in nnscore.gen_training_data(self, pdbbind_dir, pdbbind_versions, home_dir, use_proteins)
     60     home_dir = dirname(__file__) + '/NNScore'
     61 filename = path_join(home_dir, 'nnscore_descs.csv')
---> 63 super(nnscore, self)._gen_pdbbind_desc(
     64     pdbbind_dir=pdbbind_dir,
     65     pdbbind_versions=pdbbind_versions,
     66     desc_path=filename,
     67     use_proteins=use_proteins
     68 )

File ~/.conda/envs/wocondock/lib/python3.8/site-packages/oddt/scoring/__init__.py:94, in scorer._gen_pdbbind_desc(self, pdbbind_dir, pdbbind_versions, desc_path, include_general_set, use_proteins, **kwargs)
     92 df = None
     93 for pdbbind_version in pdbbind_versions:
---> 94     p = pdbbind('%s/v%i/' % (pdbbind_dir, pdbbind_version),
     95                 version=pdbbind_version,
     96                 opt=opt)
     97     # Core set
     99     for set_name in p.pdbind_sets:

File ~/.conda/envs/wocondock/lib/python3.8/site-packages/oddt/datasets.py:85, in pdbbind.__init__(self, home, version, default_set, opt)
     82         self.sets[pdbind_set] = dict(zip(self._set_ids[pdbind_set],
     83                                          self._set_act[pdbind_set]))
     84 if len(self.sets) == 0:
---> 85     raise Exception('There is no PDBbind set availabe')

Exception: There is no PDBbind set availabe

Additionally, when I then score ligands, the performance of these models is very poor (Enrichment Factor at 1% of around 0-2%) when compared to other scoring functions (as implemented in GNINA for example) achieving ~20% enrichment.

Therefore I am wondering if there is a tutorial/notebook that explains how to train these models using the gen_training_data() or fit() methods.

I was also wondering if it was possible to use a more updated version of the PDBBind data, such as version 2020, and how hard that would be to implement.

I am happy to provide the dataset I am using for comparison of the performance of these scoring functions (aldr dataset from DUD-E).

The text was updated successfully, but these errors were encountered:

mwojcikowski · 2023-02-22T19:11:38Z

Each scoring function built-in has a method .load() which loads pre-generated descriptors to train the models. Have you checked those bundled in ODDT?

Tonylac77 · 2023-02-23T15:32:06Z

Thanks for your answer. When using the load method without arguments, it starts training the model. However, I believe this is what I was suing previously and was getting low enrichment with. I will retrain now and update you. Where would i find the bundled models? I can only find .csv files in oddt/scoring/functions/NNScore/, should I be using the load() method with those?

Update : I've managed to load the pretrained model for linear PLECScore from the one bundled in ODDT. However, I would still like to understand how to train the models myself in order to use the MLP or RF version, perhaps on PDBbindv2020 and how to load the model for NNScore

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training scoring functions and updated version of PDBBind #167

Training scoring functions and updated version of PDBBind #167

Tonylac77 commented Feb 9, 2023 •

edited

mwojcikowski commented Feb 22, 2023

Tonylac77 commented Feb 23, 2023 •

edited

Training scoring functions and updated version of PDBBind #167

Training scoring functions and updated version of PDBBind #167

Comments

Tonylac77 commented Feb 9, 2023 • edited

mwojcikowski commented Feb 22, 2023

Tonylac77 commented Feb 23, 2023 • edited

Tonylac77 commented Feb 9, 2023 •

edited

Tonylac77 commented Feb 23, 2023 •

edited