Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with index Embedding layer #10

Open
lorenzoFabbri opened this issue Aug 7, 2019 · 1 comment
Open

Issue with index Embedding layer #10

lorenzoFabbri opened this issue Aug 7, 2019 · 1 comment

Comments

@lorenzoFabbri
Copy link

lorenzoFabbri commented Aug 7, 2019

I'm trying to use OpenChem for a classification task. My dataset is basically Tox21 with one label. I'm using a machine with a single GPU.

I simply adapted the provided script for Tox21 but I keep getting the following error:

/opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [67,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize failed.

File "/.../OpenChem/openchem/modules/encoders/rnn_encoder.py", line 90, in init_hidden requires_grad=True).cuda() RuntimeError: CUDA error: device-side assert triggered

Some SMILES were longer than 1024, so I removed them and now the longest one is less than 300 characters. Still, I keep getting the very same error.
I read online that these bugs are easier to find when using a CPU. I thus tried to set use_cuda=False in the configuration file. Nonetheless, it still tries to copy everything to the GPU since the error points to the same line (line 90). I then tried to set --use_cuda="False" form the command line but I keep getting the following error:

ValueError: use_cuda has to be of type <class 'bool'>.

I thus set use_cuda to False directly in openchem_encoder.py but then I get RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED. I guess it's using CUDA somewhere else, though.
Am I missing something? The Tox21 scripts seem to be working correctly (except for lots of warnings). Thanks.

@lorenzoFabbri
Copy link
Author

I was suggested to increase the size of num_embeddings from train_dataset.num_tokens to train_dataset.num_tokens+2 since the maximum of the input tensor was larger than the embedding size. I still do not know why that happened.

I then faced some more issues. I was wondering whether you tested the library on a simple classification task. Since I first had to reshape my labels with reshape(-1, 1) and then I had to modify the cast_inputs module in the Smiles2Label by replacing batch_labels = batch_labels.long() with batch_labels = torch.flatten(batch_labels.long()).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant