You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a problem: I trained a custom object detector, and for my use case, the Frames Per Second (FPS) is crucial. I used the following command:
python -m torch.distributed.launch --nproc_per_node=1 --master_port=29500 tools/analysis_tools/benchmark.py work_dirs/ca-o/ca-o.py work_dirs/ca-o/best_bbox_mAP_epoch_183.pth --launcher pytorch
Unfortunately, the command did not run properly. Instead, an error occurred, and I am unable to fix it. Maybe someone knows where the error is coming from.
/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects --local_rank argument to be set, please
change it to read from os.environ['LOCAL_RANK'] instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn(
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
load checkpoint from local path: work_dirs/ca-o/best_bbox_mAP_epoch_183.pth
Traceback (most recent call last):
File "tools/analysis_tools/benchmark.py", line 188, in
main()
File "tools/analysis_tools/benchmark.py", line 182, in main
repeat_measure_inference_speed(cfg, args.checkpoint, args.max_iter,
File "tools/analysis_tools/benchmark.py", line 151, in repeat_measure_inference_speed
measure_inference_speed(cp_cfg, checkpoint, max_iter, log_interval,
File "tools/analysis_tools/benchmark.py", line 111, in measure_inference_speed
model(return_loss=False, rescale=True, **data)
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1040, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1000, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0])
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 139, in new_func
output = old_func(*new_args, **new_kwargs)
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/mmdet-2.22.0-py3.8.egg/mmdet/models/detectors/base.py", line 174, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/mmdet-2.22.0-py3.8.egg/mmdet/models/detectors/base.py", line 137, in forward_test
img_meta[img_id]['batch_input_shape'] = tuple(img.size()[-2:])
TypeError: 'DataContainer' object is not subscriptable
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 920690) of binary: /home/cuicui/anaconda3/envs/mmdet3/bin/python
Traceback (most recent call last):
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/launch.py", line 195, in
main()
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/launch.py", line 191, in main
launch(args)
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/launch.py", line 176, in launch
run(args)
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
Hello everyone,
Firstly, I apologize for my English.
I have a problem: I trained a custom object detector, and for my use case, the Frames Per Second (FPS) is crucial. I used the following command:
python -m torch.distributed.launch --nproc_per_node=1 --master_port=29500 tools/analysis_tools/benchmark.py work_dirs/ca-o/ca-o.py work_dirs/ca-o/best_bbox_mAP_epoch_183.pth --launcher pytorch
Unfortunately, the command did not run properly. Instead, an error occurred, and I am unable to fix it. Maybe someone knows where the error is coming from.
/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects
--local_rank
argument to be set, pleasechange it to read from
os.environ['LOCAL_RANK']
instead. Seehttps://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn(
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
load checkpoint from local path: work_dirs/ca-o/best_bbox_mAP_epoch_183.pth
Traceback (most recent call last):
File "tools/analysis_tools/benchmark.py", line 188, in
main()
File "tools/analysis_tools/benchmark.py", line 182, in main
repeat_measure_inference_speed(cfg, args.checkpoint, args.max_iter,
File "tools/analysis_tools/benchmark.py", line 151, in repeat_measure_inference_speed
measure_inference_speed(cp_cfg, checkpoint, max_iter, log_interval,
File "tools/analysis_tools/benchmark.py", line 111, in measure_inference_speed
model(return_loss=False, rescale=True, **data)
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1040, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1000, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0])
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 139, in new_func
output = old_func(*new_args, **new_kwargs)
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/mmdet-2.22.0-py3.8.egg/mmdet/models/detectors/base.py", line 174, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/mmdet-2.22.0-py3.8.egg/mmdet/models/detectors/base.py", line 137, in forward_test
img_meta[img_id]['batch_input_shape'] = tuple(img.size()[-2:])
TypeError: 'DataContainer' object is not subscriptable
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 920690) of binary: /home/cuicui/anaconda3/envs/mmdet3/bin/python
Traceback (most recent call last):
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/launch.py", line 195, in
main()
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/launch.py", line 191, in main
launch(args)
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/launch.py", line 176, in launch
run(args)
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/cuicui/anaconda3/envs/mmdet3/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
tools/analysis_tools/benchmark.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2024-05-15_18:41:58
host : cuicui
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 920690)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html**
============================================================
The text was updated successfully, but these errors were encountered: