Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPT Finetune CreateStateParallel KeyError with PipeshardParallel #822

Closed
zw123han opened this issue Dec 20, 2022 · 1 comment
Closed

OPT Finetune CreateStateParallel KeyError with PipeshardParallel #822

zw123han opened this issue Dec 20, 2022 · 1 comment

Comments

@zw123han
Copy link
Contributor

zw123han commented Dec 20, 2022

Hello,

I'm tuning OPT using https://github.com/alpa-projects/alpa/tree/main/examples/opt_finetune with FollowParallel and CreateStateParallel to offload the peak CPU memory to my devices. This should also resolve issue #811

However, when method contains PipeshardParallel the following error is displayed. The same error is not observed when PipeshardParallel degenerates into ShardParallel (i.e. pipeline_parallel=1), nor with data_parallel + operator_parallel only.

image

To Reproduce

I attach my modified run_clm_flax.py (relevant lines: 725, 829). You can directly replace the version from https://github.com/alpa-projects/alpa/tree/main/examples/opt_finetune and reproduce the error with the existing run_2.7b_pipe.sh launch script.

https://github.com/zw123han/alpa/blob/main/examples/opt_finetune/run_clm_flax.py


Aside, do you think it would be helpful to merge this as the default OPT finetune script once stabilized to avoid CPU bottleneck for large models?

@zw123han zw123han changed the title [OPT Finetune] CreateStateParallel KeyError with PipeshardParallel [FEATURE/BUG] OPT Finetune CreateStateParallel KeyError with PipeshardParallel Dec 20, 2022
@zw123han zw123han changed the title [FEATURE/BUG] OPT Finetune CreateStateParallel KeyError with PipeshardParallel OPT Finetune CreateStateParallel KeyError with PipeshardParallel Dec 20, 2022
@merrymercy
Copy link
Member

solved by #873.
See also #858

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants