-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Version >0.14.0 leads to RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
#5538
Comments
Hi @pacman100 - thanks for making this issue here to better track it. Does this also happen with the latest changes in the master branch? |
I confirm that test passes when using the master branch.It would be great to have a patch release if possible. |
The problem will exist in |
@bug-fixed, are you able to share repro for |
@tjruwase please try this example (https://github.com/haotian-liu/LLaVA/blob/main/scripts/v1_5/finetune.sh) with zero3-offload. Thanks. |
@bug-fixed that repro does not work. Please provide a more precise single script reproduction. |
@jomayeri , thanks for the response. The file needed in the script can be downloaded in here: https://huggingface.co/liuhaotian/llava-v1.5-mlp2x-336px-pretrain-vicuna-13b-v1.5/tree/main. Unfortunately, I think it's difficult for me to prepare a more concise script, apologize for this. I checked the model with only Llama-3, the Zero3-offload works fine. But when I tested it using the script above, i.e., equipped with a vision transformer and another simple linear module, the problem occurred. I guess many factors may lead to the problem. Please note that the conclusion in my previous comment might be wrong because of my very limited knowledge in DeepSpeed. I have the following partial error information for your check:
It shows that some parameter file was not saved in the storage. I guess one possible reason is that it failed to build the correct parameter mapping. My Zero3-offload is:
|
@tjruwase I have updated my comment, please kindly check it. Thanks. |
@bug-fixed Does the same thing happen when you offload to CPU? |
@jomayeri Just encountered this problem. I use CPU offloading, and here is my deepspeed config: "zero_optimization": {
"stage": 3,
"offload_param": {
"device": "cpu",
"pin_memory": true
},
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"overlap_comm": true,
"contiguous_gradients": true,
"sub_group_size": 1e9,
"reduce_bucket_size": "auto",
"stage3_prefetch_bucket_size": "auto",
"stage3_param_persistence_threshold": "auto",
"stage3_max_live_parameters": 1e9,
"stage3_max_reuse_distance": 1e9,
"stage3_gather_16bit_weights_on_model_save": true
} The specific traceback is
|
I found a workaround. Just manually patching your |
@loadams I directly modified the source in my deepspeed 0.14.2 installation, and ZeRO stage 3 is working fluently now. The status of the latest code should depend on Pull 5493, as it re-introduced the buggy optimization. |
@jomayeri The machine I'm working on has very limited memory and is shared with others. it is difficult for me to test the |
Both this issue and #5422 are referring to this line in Zero3 which appears to be reverted to the correct state in master. If any user is having similar issues (@bug-fixed) please open a separate thread. |
Describe the bug
[BUG] Version >0.14.0 leads to
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
To Reproduce
Steps to reproduce the behavior:
output error trace:
The text was updated successfully, but these errors were encountered: