Break the dependency between torch.nn and torch.distributed #126347
Labels
oncall: distributed
Add this issue/PR to distributed oncall triage queue
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
馃殌 The feature, motivation and pitch
We are seeing several import issues when compiling distributed modules. And the root cause is there are circular dependencies between
torch.nn
andtorch.distributed
. Some examples:torch.nn
will import DDP, which will rely on manytorch.distributed
modules.torch.nn
will importpytorch/torch/_jit_internal.py
, which rely ontorch.distributed.rpc
.While right now we work around these circular dependency issues, but these issues recurrently happen when compiling the distributed modules. We need to ensure lazy dependencies on
torch.distributed
fortorch.nn
Alternatives
No response
Additional context
No response
cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k
The text was updated successfully, but these errors were encountered: