You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FI_HMEM support in libfabric is a boolean on/off to represent a wide variety of HMEM capabilities (GDR, memory synchronization status, async copies, etc...). We should use the opportunity of Libfabric 2.0 to define an interface that clearly defines our HMEM capabilities, and what we expect our users to do for correct Libfabric behavior (i.e.. set CU_POINTER_ATTRIBUTE_SYNC_MEMOPS on every cuda pointer they pass into our interface).
The text was updated successfully, but these errors were encountered:
what are the possible capabilities? p2p RDMA with device memory support, dev mem map (gdr copy) (application may not need to know), option to disable device copy method. p2p copy (IPC).
how many of them need to be exposed to application?
maybe we need some internal provider only options? hmem_ops?
expose hints about hw supported offload? instead of sw emulated.
HMEM interface name as additional input, e.g. "cuda", "rocr", .. today we have FI_HMEM env --> want a programmatic version, We need some extra parameter in fi_getinfo(). Nice to have.
FI_HMEM support in libfabric is a boolean on/off to represent a wide variety of HMEM capabilities (GDR, memory synchronization status, async copies, etc...). We should use the opportunity of Libfabric 2.0 to define an interface that clearly defines our HMEM capabilities, and what we expect our users to do for correct Libfabric behavior (i.e.. set CU_POINTER_ATTRIBUTE_SYNC_MEMOPS on every cuda pointer they pass into our interface).
The text was updated successfully, but these errors were encountered: