Training on Wikidata (huge dataset) using OpenKE #406

unmeshvrije · 2024-01-24T15:01:03Z

As my research project, I am trying to use OpenKE for loading Wikidata truthy NT file

However, when I reach the following step in openke/config/Trainer.py file

        if self.use_gpu:
            self.model.cuda()

I get the CUDA out of memory error.

  File "/home/myname/OpenKE/openke/config/Trainer.py", line 58, in run
    self.model.cuda()
  File "/var/scratch/anaconda3/envs/py3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 749, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/var/scratch/anaconda3/envs/py3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/var/scratch/anaconda3/envs/py3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/var/scratch/anaconda3/envs/py3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 664, in _apply
    param_applied = fn(param)
  File "/var/scratch/anaconda3/envs/py3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 749, in <lambda>
    return self._apply(lambda t: t.cuda(device))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1614.71 GiB (GPU 0; 47.54 GiB total capacity; 0 bytes already allocated; 47.17 GiB free; 0 bytes reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

My question is: Has someone tried training embeddings for huge datasets like Wikidata ?
Any pointers would be appreciated

For the above dataset, there were
7794277662 Number of Triples (7 billion)
6235422129 (80% training triples = 6 billion)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training on Wikidata (huge dataset) using OpenKE #406

Training on Wikidata (huge dataset) using OpenKE #406

unmeshvrije commented Jan 24, 2024 •

edited

Training on Wikidata (huge dataset) using OpenKE #406

Training on Wikidata (huge dataset) using OpenKE #406

Comments

unmeshvrije commented Jan 24, 2024 • edited

unmeshvrije commented Jan 24, 2024 •

edited