-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Zero3: Gather the params for inference(huggingface_language_model.generate) in the end of 1 epoch and re-partition it for next epoch training #5539
Comments
@Coobiw, you can you use the GatheredParameters context manager which will automatically gather the parameters within the context, and release on exit. You can see a simple example usage of computing moving average of parameters here. |
Hi, I've tried this before. But the program is stuck. How can I debug this? And I want to know whether it is because I use 30B+ LLM and zero3 inference is very slow?
|
@Coobiw, can you share your full script to help us repro on our side? Is this a dense or MoE model? In terms of debugging, can you use prints to pin-point the hang point? Also, can you try to repro on single gpu so that you can use pdb for debugging. You can try two options for this:
|
Sorry, it is inconvenient to share the whole code. I would try my best to provide more information. It is a dense model. I've tried the script on my ~9B model on A100 80GB. Similar stuck appeared. I think it may be a multi-gpu communication problem? No explicit bug. Only a warning in
I guess Additionally, my env is as following:
The output of
|
after double check, I find another error message on one worker. as following(time-out error probably):
|
hi, I also test this in one node(8 x A100) with one 9B model. Stuck appeared. TAT |
Another cause of hanging like this is if prompt length or generation length is different across the GPUs. This is because zero-inference is data-parallel algorithm |
Oh, thanks, I get it. Do you have any suggestion about this? I think I've done left-padding. How to ensure the output length? |
@Coobiw, I think we need to first confirm that different prompt/generation lengths are responsible. Can you force all the ranks to process the exact same prompt? |
Describe the bug
Hi, I use zero-3 for MLLM training. After one-epoch training stage, I want to evaluate this model(using model.generate()). However, params of the model are located on multi-gpu, lacking of gather.
If not gathering params, during evaluation(generation), error will be raised because the forward process like:
RuntimeError: The size of tensor a (1152) must match the size of tensor b (0) at non-singleton dimension 2
How can I gather params on every gpu for paralized evaluation(inference/generation), liking using
deepspeed.zero.GatheredParameters
? And after evaluation, how can I shard the model parameters again for next training epoch?Thanks for your reply!
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
ds_report output
Please run
ds_report
to give us details about your setup.Screenshots
If applicable, add screenshots to help explain your problem.
System info (please complete the following information):
Launcher context
Are you launching your experiment with the
deepspeed
launcher, MPI, or something else?Docker context
Are you using a specific docker image that you can share?
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: