Add logic about PrivateUse1 in sdp::SDPBackend::flash_attention #124271
Labels
module: multi-headed-attention
module: PrivateUse1
private use
oncall: transformer/mha
Issues related to Transformers and MultiheadAttention
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
馃殌 The feature, motivation and pitch
https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/transformers/attention.cpp#L675 The logic only includes CUDA and CPU.
The logic for third-party cuda-like devices should be added to facilitate adaptation for these devices.
cc @jbschlosser @bhosmer @cpuhrsch @erichan1 @drisspg @mikaylagawarecki
My code snippet
Alternatives
Additional context
No response
The text was updated successfully, but these errors were encountered: