You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, when method contains PipeshardParallel the following error is displayed. The same error is not observed when PipeshardParallel degenerates into ShardParallel (i.e. pipeline_parallel=1), nor with data_parallel + operator_parallel only.
Aside, do you think it would be helpful to merge this as the default OPT finetune script once stabilized to avoid CPU bottleneck for large models?
The text was updated successfully, but these errors were encountered:
zw123han
changed the title
[OPT Finetune] CreateStateParallel KeyError with PipeshardParallel
[FEATURE/BUG] OPT Finetune CreateStateParallel KeyError with PipeshardParallel
Dec 20, 2022
zw123han
changed the title
[FEATURE/BUG] OPT Finetune CreateStateParallel KeyError with PipeshardParallel
OPT Finetune CreateStateParallel KeyError with PipeshardParallel
Dec 20, 2022
Hello,
I'm tuning OPT using https://github.com/alpa-projects/alpa/tree/main/examples/opt_finetune with
FollowParallel
andCreateStateParallel
to offload the peak CPU memory to my devices. This should also resolve issue #811However, when
method
contains PipeshardParallel the following error is displayed. The same error is not observed when PipeshardParallel degenerates into ShardParallel (i.e.pipeline_parallel=1
), nor with data_parallel + operator_parallel only.To Reproduce
I attach my modified
run_clm_flax.py
(relevant lines:725, 829
). You can directly replace the version from https://github.com/alpa-projects/alpa/tree/main/examples/opt_finetune and reproduce the error with the existingrun_2.7b_pipe.sh
launch script.https://github.com/zw123han/alpa/blob/main/examples/opt_finetune/run_clm_flax.py
Aside, do you think it would be helpful to merge this as the default OPT finetune script once stabilized to avoid CPU bottleneck for large models?
The text was updated successfully, but these errors were encountered: