[core][experimental] Support broadcast NCCL ops in accelerated DAG #45308

stephanie-wang · 2024-05-13T22:36:12Z

Description

When the same GPU tensor is sent to multiple readers, we should use ncclBroadcast under the hood to reduce transfer time.

Use case

No response

stephanie-wang mentioned this issue May 13, 2024

[core][experimental] Meta-issue: Support transferring GPU tensors in accelerated DAG #43830

Open

5 tasks

stephanie-wang self-assigned this May 13, 2024

anyscalesam added performance usability labels May 23, 2024

anyscalesam added this to the ADAG Developer Preview milestone May 23, 2024

stephanie-wang assigned stephanie-wang and unassigned stephanie-wang May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][experimental] Support broadcast NCCL ops in accelerated DAG #45308

[core][experimental] Support broadcast NCCL ops in accelerated DAG #45308

stephanie-wang commented May 13, 2024

[core][experimental] Support broadcast NCCL ops in accelerated DAG #45308

[core][experimental] Support broadcast NCCL ops in accelerated DAG #45308

Comments

stephanie-wang commented May 13, 2024

Description

Use case