Redefine FI_HMEM Interface In Libfabric 2.0 #9447

a-szegel · 2023-10-18T15:46:43Z

FI_HMEM support in libfabric is a boolean on/off to represent a wide variety of HMEM capabilities (GDR, memory synchronization status, async copies, etc...). We should use the opportunity of Libfabric 2.0 to define an interface that clearly defines our HMEM capabilities, and what we expect our users to do for correct Libfabric behavior (i.e.. set CU_POINTER_ATTRIBUTE_SYNC_MEMOPS on every cuda pointer they pass into our interface).

j-xiong · 2024-04-02T16:47:34Z

what are the possible capabilities? p2p RDMA with device memory support, dev mem map (gdr copy) (application may not need to know), option to disable device copy method. p2p copy (IPC).

how many of them need to be exposed to application?

maybe we need some internal provider only options? hmem_ops?

expose hints about hw supported offload? instead of sw emulated.

HMEM interface name as additional input, e.g. "cuda", "rocr", .. today we have FI_HMEM env --> want a programmatic version, We need some extra parameter in fi_getinfo(). Nice to have.

shijin-aws · 2024-04-09T22:39:50Z

Another related capability: whether the hmem iface support dmabuf reg.

Currently application won't know this without calling fi_mr_regattr... Ideally this can be returned during fi_getinfo...

a-szegel added the ofi-2.0 label Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redefine FI_HMEM Interface In Libfabric 2.0 #9447

Redefine FI_HMEM Interface In Libfabric 2.0 #9447

a-szegel commented Oct 18, 2023 •

edited

j-xiong commented Apr 2, 2024

shijin-aws commented Apr 9, 2024 •

edited

Redefine FI_HMEM Interface In Libfabric 2.0 #9447

Redefine FI_HMEM Interface In Libfabric 2.0 #9447

Comments

a-szegel commented Oct 18, 2023 • edited

j-xiong commented Apr 2, 2024

shijin-aws commented Apr 9, 2024 • edited

a-szegel commented Oct 18, 2023 •

edited

shijin-aws commented Apr 9, 2024 •

edited