libfabric version2: Initial set of proposed changes #9384

shefty · 2023-09-29T01:16:13Z

For discussion and review only. CANNOT BE MERGED!!!

shefty · 2023-10-04T23:35:13Z

Update: adjusting patches to start working through CI issues (many of which are related to building on other platforms and missing updates to the CI scripts)

shefty · 2023-10-05T18:55:46Z

@zachdworkin - earlier Intel CI failed on socket provider testing, which makes sense, since I removed that provider. Is there some way I can disable sockets from being run in the contrib/intel scripts, or would this require updates to the CI system? (I'm NOT asking to disable anything in the CI system now -- that wouldn't be until we're closer to accepting these sort of changes).

zachdworkin · 2023-10-05T19:02:43Z

@zachdworkin - earlier Intel CI failed on socket provider testing, which makes sense, since I removed that provider. Is there some way I can disable sockets from being run in the contrib/intel scripts, or would this require updates to the CI system? (I'm NOT asking to disable anything in the CI system now -- that wouldn't be until we're closer to accepting these sort of changes).

The only way I know that you can do this is with the replay function and either comment out that stage or just delete it from your replay. Otherwise I will need to remove it from the pipeline with a different PR

shefty · 2023-10-05T21:16:38Z

mpich references FI_NAME_MAX, so removing that from the public header breaks that build. We either need to add it back in and document what it actually means, leave it purely for compatibility, or update mpich to use an internal definition.

Provider only supported by 1.x series Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Provider only supported by v1.x series Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

GNI is only supported by the v1.x series Signed-off-by: Sean Hefty <sean.hefty@intel.com>

NetworkDirect support is supported by verbs provider. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Flag is only used between rxm and verbs. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

The value is constrained to 32-bit int flags, not u64 flags. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Remove support for asynchronous insertions and AV_MAP. The format of the fi_addr_t value will either be indexed based in the standard case or provider defined in more advanced use cases, based on the AV configuration (such as using auth_keys). Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Remove FI_BUFFERED_RECV as an exported API option. Since it's currently used internally between mrail and rxm, make it an internal only option. It has a limited use case for multirail over rxm over connected endpoints where shared receive queues are not available. With shared receive queues, the feature wouldn't be needed, as mrail could own the buffers outright. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Recommend that applications and providers use FI_THREAD_COMPLETION as the preferred threading model for lockless operation when using scalable endpoints. This helps align application design with the provider implementation. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Remove overly complicated threading models and focus on specific models to allow better alignment between application designs and provider implementation. Use FI_THREAD_DOMAIN as the preferred lockless threading model for standard endpoints. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Combine data and control progress into one progress option. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Completions are always unordered. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Field was deprecated and only serves as a placeholder for compatility. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Support for wait sets adds significant complexity to the provider implementation and is basically an abstraction around the OS constructs for poll/epoll (on Linux). Remove the feature from the API, but keep the internal implementation for now. This allows providers to move away from wait set support. Note that blocking support and support for native wait objects (e.g. epoll fd's) are still supported by the API. Only the wait set abstraction is removed, which allows providers control over creating wait objects. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Poll sets are a simple iterator around progressing multiple objects. Remove from the API to reduce provider complexity. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

There's never been an implementation. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Remove FI_MR_BASIC/SCALABLE/UNSPEC. These were deprecated in version 1.5. Remove FI_LOCAL_MR, which was an earlier version of FI_MR_LOCAL and also deprecated. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Feature is not implemented natively by providers. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Remove FI_ORDER_NONE (flag that's 0) and FI_ORDER_STRICT (which isn't a flag, and covers only a portion of the valid flags). Remove FI_ORDER_DATA, which will not be supported by version 2 in order to allow for greater provider optimization. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Remove the option of directing transmit and receive completions to separate CQs for the same endpoint. The option adds complexity at the provider and application levels. This is largely the result of needing SW based protocols for certain operations, such as tag matching. This either makes it necessary for the app to drive progress across multiple CQs, or the provider emulates the application's CQs in SW. This change updates the man page only. Provider developers are left to update their code bases separately. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Disallow users hand-crafting or hand-copying their own fi_info structs. This allows the library to allocate hidden fields for internal use. Plus, there's no need to apps to do this, given that the API call is way easier to use. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Add a new call that takes fi_info as input, which provides consistency with the domain and endpoint open calls. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Subsystem filtering isn't useful (or likely ever used). Remove the enum fi_log_subsys. Instead convert the API to accept int flags. This maintains API compatibility. Update all current FI_LOG_xxx subsys values to 0. This avoids needing to update the providers to the new API, forcing them to pass in 0 for the flags. No actual flag values are defined. Those become a placeholder for future options. The logging checks are simplified by this change. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Introduce the concept of peer groups. A peer group is a set of peers that are communicating together for some specific set of tasks. Peer groups provide a lower-level mapping of HPC and AI communicators. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Allow specifying precise tag formatting options. The mem_tag_format takes as input a set of bit fields. In practice, this ends up being unusable to implement, resulting in the entire tag simply being masked with ignore bits. When the mem_tag_format value only has the lower bits set (< 256), interpret the format as specific options. Two new options are defined, one aligned with MPI and the other with CCLs. This information can be used by providers to optimize for the separate use cases. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

mblocksome · 2024-02-07T17:03:51Z

man/fi_msg.3.md

@@ -222,7 +222,7 @@ fi_sendmsg.

 *FI_CLAIM*
 : Applies to posted receive operations for endpoints configured
-  for FI_BUFFERED_RECV or FI_VARIABLE_MSG.  This flag is used to
+  for FI_BUFFERED_RECV.  This flag is used to


In addition to the FI_VARIABLE_MSG flag, the FI_BUFFERED_RECV flag was also removed in this pull request. Therefore, shouldn't the FI_CLAIM and FI_DISCARD flags also be removed?

FI_CLAIM and FI_DISCARD can also be used with FI_PEEK.

@mblocksome and @j-xiong are both correct. However, I agree with Mike. This change is to the fi_msg man page, and FI_PEEK does not apply there. FI_CLAIM and FI_DISCARD can be removed as flags for the fi_msg calls. They remain for fi_tagged.

mblocksome · 2024-02-07T17:04:14Z

man/fi_msg.3.md

@@ -234,7 +234,7 @@ fi_sendmsg.

 *FI_DISCARD*
 : Applies to posted receive operations for endpoints configured
-  for FI_BUFFERED_RECV or FI_VARIABLE_MSG.  This flag is used to
+  for FI_BUFFERED_RECV.  This flag is used to


In addition to the FI_VARIABLE_MSG flag, the FI_BUFFERED_RECV flag was also removed in this pull request. Therefore, shouldn't the FI_CLAIM and FI_DISCARD flags also be removed?

mblocksome · 2024-02-07T17:12:52Z

man/fi_tagged.3.md

@@ -319,11 +319,6 @@ The following flags may be used with fi_trecvmsg.
  fi_context structure used for an FI_PEEK + FI_CLAIM operation must be used


remove text about fi_recv_context since buffered receives have been removed?

shefty · 2024-02-12T23:04:09Z

include/rdma/fabric.h

+enum {
+	FI_TAG_BITS,
+	FI_TAG_HPC,
+	FI_TAG_AI,


The above should revert back to FI_TAG_MPI and FI_TAG_NCCL, respectively, as the tag formats specifically target the tag definitions for that middleware.

FI_TAG_CCL to be more implementation neutral.

shefty added ⚠️ Do not merge evaluating ofi-2.0 labels Sep 29, 2023

shefty force-pushed the version2 branch 4 times, most recently from 3be6baf to df4d8bd Compare October 3, 2023 01:05

This was referenced Oct 4, 2023

core: Optimize tag APIs for CCL and MPI #9387

Closed

core: Support multiple auth keys per EP #9319

Closed

shefty force-pushed the version2 branch from df4d8bd to 85e336a Compare October 4, 2023 23:30

shefty force-pushed the version2 branch from 85e336a to 5fcad5d Compare October 5, 2023 18:27

shefty force-pushed the version2 branch 2 times, most recently from 8a2af9d to 9c4caa2 Compare October 5, 2023 20:53

shefty force-pushed the version2 branch 2 times, most recently from 5dabf05 to 412a65a Compare October 5, 2023 22:48

shefty added 5 commits October 9, 2023 15:15

prov/bgq: Remove provider

76e2999

Provider only supported by 1.x series Signed-off-by: Sean Hefty <sean.hefty@intel.com>

prov/usnic: Remove provider

e231758

Provider only supported by v1.x series Signed-off-by: Sean Hefty <sean.hefty@intel.com>

prov/rstream: Remove unfinished provider

bbcf4ac

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

prov/gni: Remove provider

c1ae77e

GNI is only supported by the v1.x series Signed-off-by: Sean Hefty <sean.hefty@intel.com>

prov/netdir: Remove provider

ac4400a

NetworkDirect support is supported by verbs provider. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

shefty force-pushed the version2 branch 6 times, most recently from ffd3a52 to dc7faf6 Compare October 11, 2023 21:58

shefty added 26 commits October 16, 2023 16:39

core: Move FI_PRIORITY to internal flag

57984a2

Flag is only used between rxm and verbs. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

core: Remove FI_PROVIDER_SPECIFIC

486d48d

The value is constrained to 32-bit int flags, not u64 flags. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

core: Remove unimplemented EP types

224195d

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

core: Remove unimplemented FI_VARIABLE_MSG

d40be4a

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

core: Remove unimplemented FI_XPU_TRIGGER

50717fc

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

core: Remove unused FI_RESTRICTED_COMP and FI_NOTIFY_FLAGS_ONLY

d3ba9f8

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

core: Simplify progress definition

1413486

Combine data and control progress into one progress option. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

core: Remove comp_order attribute

1466a80

Completions are always unordered. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

core: Remove total_buffered_recv

133d965

Field was deprecated and only serves as a placeholder for compatility. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

core: Remove fid_poll from the public API

32adc2c

Poll sets are a simple iterator around progressing multiple objects. Remove from the API to reduce provider complexity. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

core: Remove FI_WAIT_MUTEX_COND support from API

ed2f32e

There's never been an implementation. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

core: Remove deprecated MR mode options

4e40418

Remove FI_MR_BASIC/SCALABLE/UNSPEC. These were deprecated in version 1.5. Remove FI_LOCAL_MR, which was an earlier version of FI_MR_LOCAL and also deprecated. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

core: Remove support for async memory registration

e0652c1

Feature is not implemented natively by providers. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

core: Add fi_fabric2() API

5fb413c

Add a new call that takes fi_info as input, which provides consistency with the domain and endpoint open calls. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

docs: Add information on porting applications between v1 and v2

ac80575

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

core: Add new peer group feature

5f3c619

Introduce the concept of peer groups. A peer group is a set of peers that are communicating together for some specific set of tasks. Peer groups provide a lower-level mapping of HPC and AI communicators. Signed-off-by: Sean Hefty <sean.hefty@intel.com>

mblocksome suggested changes Feb 7, 2024

View reviewed changes

shefty commented Feb 12, 2024

View reviewed changes

j-xiong mentioned this pull request Mar 2, 2024

Remove unimplemented features #9852

Merged

j-xiong mentioned this pull request Apr 29, 2024

Ofi 2.0 changes #10009

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libfabric version2: Initial set of proposed changes #9384

libfabric version2: Initial set of proposed changes #9384

shefty commented Sep 29, 2023

shefty commented Oct 4, 2023

shefty commented Oct 5, 2023

zachdworkin commented Oct 5, 2023

shefty commented Oct 5, 2023

mblocksome Feb 7, 2024

j-xiong Feb 19, 2024

shefty Feb 19, 2024

mblocksome Feb 7, 2024

mblocksome Feb 7, 2024

shefty Feb 12, 2024

j-xiong Feb 19, 2024

		@@ -319,11 +319,6 @@ The following flags may be used with fi_trecvmsg.
		fi_context structure used for an FI_PEEK + FI_CLAIM operation must be used

libfabric version2: Initial set of proposed changes #9384

Are you sure you want to change the base?

libfabric version2: Initial set of proposed changes #9384

Conversation

shefty commented Sep 29, 2023

shefty commented Oct 4, 2023

shefty commented Oct 5, 2023

zachdworkin commented Oct 5, 2023

shefty commented Oct 5, 2023

mblocksome Feb 7, 2024

Choose a reason for hiding this comment

j-xiong Feb 19, 2024

Choose a reason for hiding this comment

shefty Feb 19, 2024

Choose a reason for hiding this comment

mblocksome Feb 7, 2024

Choose a reason for hiding this comment

mblocksome Feb 7, 2024

Choose a reason for hiding this comment

shefty Feb 12, 2024

Choose a reason for hiding this comment

j-xiong Feb 19, 2024

Choose a reason for hiding this comment