Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The default PretrainAtomFeaturizer does not work for the ClinTox dataset. #169

Open
shuix007 opened this issue Feb 1, 2022 · 1 comment

Comments

@shuix007
Copy link

shuix007 commented Feb 1, 2022

Hi,

I was trying the script in dgl-lifesci/examples/property_prediction/moleculenet for molecular property prediction. I got the following error when running command python classification.py -d ClinTox -mo gin_supervised_masking

Using backend: pytorch
Directory classification_results already exists.
Processing dgl graphs from scratch...
Traceback (most recent call last):
File "classification.py", line 186, in
n_jobs=1 if args['num_workers'] == 0 else args['num_workers'])
File "/export/scratch/Zeren/conda/lib/python3.7/site-packages/dgllife/data/clintox.py", line 109, in init
n_jobs=n_jobs)
File "/export/scratch/Zeren/conda/lib/python3.7/site-packages/dgllife/data/csv_dataset.py", line 78, in init
load, log_every, init_mask, n_jobs, error_log)
File "/export/scratch/Zeren/conda/lib/python3.7/site-packages/dgllife/data/csv_dataset.py", line 139, in _pre_process
edge_featurizer=edge_featurizer))
File "/export/scratch/Zeren/conda/lib/python3.7/site-packages/dgllife/utils/mol_to_graph.py", line 375, in smiles_to_bigraph
canonical_atom_order, explicit_hydrogens, num_virtual_nodes)
File "/export/scratch/Zeren/conda/lib/python3.7/site-packages/dgllife/utils/mol_to_graph.py", line 276, in mol_to_bigraph
canonical_atom_order, explicit_hydrogens, num_virtual_nodes)
File "/export/scratch/Zeren/conda/lib/python3.7/site-packages/dgllife/utils/mol_to_graph.py", line 90, in mol_to_graph
g.ndata.update(node_featurizer(mol))
File "/export/scratch/Zeren/conda/lib/python3.7/site-packages/dgllife/utils/featurizers.py", line 1293, in call
self._atomic_number_types.index(atom.GetAtomicNum()),
ValueError: 0 is not in list

It seems that there exist atoms in the ClinTox dataset that return 0 when calling GetAtomicNum() that is out of the default atomic_number_types of PretrainAtomFeaturizer(). The problem could be resolved by passing node_featurizer=PretrainAtomFeaturizer(atomic_number_types=list(range(119))) when constructing the ClinTox dataset. But not sure what does a 0 atomic number mean.

@mufeili
Copy link
Contributor

mufeili commented Feb 2, 2022

I remember there are * in a few SMILES strings, which stand for an arbitrary atom, which might get assigned an atomic number 0. See:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants