Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch: RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED #126344

Open
tjasmin111 opened this issue May 15, 2024 · 2 comments
Labels
module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@tjasmin111
Copy link

tjasmin111 commented May 15, 2024

When I'm trying to use PyTorch with YOLO with yolo detect ... device=0, I'm getting this error:

RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=, num_gpus=

Outputs of PyTorch usage and GPU availability as shown below looks good though.

What is the problem? How to fix it?

Python 3.9.7 | packaged by conda-forge | (default, Sep  2 2021, 17:58:34) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.zeros(2).cuda(0)
tensor([0., 0.], device='cuda:0')
>>> print(torch.__version__)
2.3.0+cu118
>>> print(f"Is CUDA available?: {torch.cuda.is_available()}")
Is CUDA available?: True
>>> print(f"Number of CUDA devices: {torch.cuda.device_count()}")
Number of CUDA devices: 3
>>> device = torch.device('cuda')
>>> print(f"A torch tensor: {torch.rand(5).to(device)}")
A torch tensor: tensor([0.6085, 0.7618, 0.6855, 0.5276, 0.1606], device='cuda:0')

Full stack trace:

Traceback (most recent call last):
  File "/home/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 306, in _lazy_init
    queued_call()
  File "/home/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 174, in _check_capability
    capability = get_device_capability(d)
  File "/home/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 430, in get_device_capability
    prop = get_device_properties(device)
  File "/home/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 448, in get_device_properties
    return _get_device_properties(device)  # type: ignore[name-defined]
RuntimeError: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=, num_gpus=

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/conda/bin/yolo", line 8, in <module>
    sys.exit(entrypoint())
  File "/home/conda/lib/python3.9/site-packages/ultralytics/cfg/__init__.py", line 583, in entrypoint
    getattr(model, mode)(**overrides)  # default args from model
  File "/home/conda/lib/python3.9/site-packages/ultralytics/engine/model.py", line 528, in val
    validator(model=self.model)
  File "/home/conda/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/conda/lib/python3.9/site-packages/ultralytics/engine/validator.py", line 126, in __call__
    device=select_device(self.args.device, self.args.batch),
  File "/home/conda/lib/python3.9/site-packages/ultralytics/utils/torch_utils.py", line 156, in select_device
    p = torch.cuda.get_device_properties(i)
  File "/home/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 444, in get_device_properties
    _lazy_init()  # will define _get_device_properties
  File "/home/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 312, in _lazy_init
    raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at "../aten/src/ATen/cuda/CUDAContext.cpp":50, please report a bug to PyTorch. device=, num_gpus=

CUDA call was originally invoked at:

  File "/home/conda/bin/yolo", line 5, in <module>
    from ultralytics.cfg import entrypoint
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/conda/lib/python3.9/site-packages/ultralytics/__init__.py", line 5, in <module>
    from ultralytics.data.explorer.explorer import Explorer
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/conda/lib/python3.9/site-packages/ultralytics/data/__init__.py", line 3, in <module>
    from .base import BaseDataset
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/conda/lib/python3.9/site-packages/ultralytics/data/base.py", line 15, in <module>
    from torch.utils.data import Dataset
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/conda/lib/python3.9/site-packages/torch/__init__.py", line 1478, in <module>
    _C._initExtension(manager_path())
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 238, in <module>
    _lazy_call(_check_capability)
  File "/home/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 235, in _lazy_call
    _queued_calls.append((callable, traceback.format_stack()))

cc @ptrblck @msaroufim

@cpuhrsch cpuhrsch added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 16, 2024
@cpuhrsch
Copy link
Contributor

Hello @tjasmin111 - could you please also add more information about the environment and PyTorch version by running the collect_env.py script. See details in the issue creation flow for bug reports: https://github.com/pytorch/pytorch/issues/new?assignees=&labels=&projects=&template=bug-report.yml

@jabuarab
Copy link

jabuarab commented May 17, 2024

Hi @tjasmin111 , check "pretzel" coment,https://discuss.pytorch.org/t/runtimeerror-device-0-device-num-gpus-internal-assert-failed/178118/4

Most likely cuda doesnt see your devices and you should set their visiblilty before importing torch, it isnt a solution to the real problem tho.

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
os.environ["WORLD_SIZE"] = "1"
import torch
Also this https://discuss.pytorch.org/t/torch-cuda-device-count-is-0/168545 helped to see the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants