-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
09 of 10 LNX series - Add the peer infrastructure to the CXI provider #10033
Open
amirshehataornl
wants to merge
12
commits into
ofiwg:main
Choose a base branch
from
amirshehataornl:08_lnx_cxi_updates
base: main
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+658
−179
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
amirshehataornl
force-pushed
the
08_lnx_cxi_updates
branch
2 times, most recently
from
May 10, 2024 00:33
239fb62
to
6d97ee5
Compare
When checking fabric attributes with ofi_check_fabric_attr() make sure to consider provider exclusion. When checking to see if a provider name is given, only consider ones which are not excluded using the '^' character. Signed-off-by: Amir Shehata <shehataa@ornl.gov>
It is not efficient to do a reverse lookup on the AV table when a message is received. Some providers do not store the fi_addr_t associated with the peer in the header passed on the wire. And it is not practical to require providers to add that to wire header, as it would break backwards compatibility. In order to handle this case, an address matching callback is added to the peer_srx.peer_ops structure. This allows the provider receiving the message to register an address matching callback. This callback is called by the owner provider to match an fi_addr_t with provider specific address in the message received. The callback allows the receiving provider to do an O(1) index into the AV table to lookup the address of the peer, and then compare that with the source address in the received message. As part of this change provider specific address information needs to be passed to the owner provider, which the owner will need to give back to the receiving provider, when it attempts to do address matching. Update the SHM and LINKx providers to conform with the API changes Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Add a new structure fi_peer_match to collect the parameters which need to be passed to the get_msg and get_tag functions. Update the util_get_tag() and util_get_msg() function callbacks. Compilation gives a warning but not failing. This causes memory corruption when the callbacks are called. Signed-off-by: Amir Shehata <shehataa@ornl.gov>
amirshehataornl
force-pushed
the
08_lnx_cxi_updates
branch
from
May 16, 2024 13:23
6d97ee5
to
279d9ab
Compare
Add a memory registration callback to the fi_ops_srx_peer. This allows core providers to expose a memory registration callback which the parent or peer provider can use to register memory on the receive path. For example the CXI provider registers memory with the NIC on the receive path. When using the peer infrastructure this can not happen because we do not know which provider will perform the receive operation. But if the source NID is specified then we can know and therefore we can perform the receive buffer registration at the top of the receive path. Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Add FI_PEER capability bit Signed-off-by: Amir Shehata <shehataa@ornl.gov>
The parent provider should be able to get access to the peer provider callbacks. Added the srx block in the fid.context so we can retrieve it later on. Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Add the FI_PEER capability bit to the SHM fi_infos Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Add the FI_PEER capability bit to the CXI provider fi_info Signed-off-by: Amir Shehata <shehataa@ornl.gov>
On cq_open, check the FI_PEER_IMPORT, if set, set all internal cq operation to be enosys, with the exception to the read callback. The read callback is overloaded to operate as a progress callback function. Invoking the read callback will progress the enpoints linked to this CQ. Keep track of the fid_peer_cq structure passed in. If the FI_PEER_IMPORT flag is set, then set the callbacks in cxip_cq structure which handle writing to the peer_cq, otherwise set them to the ones which write to the util_cq. A provider needs to call a different set of functions to insert completion events into an imported CQ vs an internal CQ. These set of callback definition standardize a way to assign a different function to a CQ object, which can then be called to insert into the CQ. For example: struct prov_cq { struct util_cq *util_cq; struct fid_peer_cq *peer_cq; ofi_peer_cq_cb cq_cb; }; When a provider opens a CQ it can: if (attr->flags & FI_PEER_IMPORT) { prov_cq->cq_cb.cq_comp = prov_peer_cq_comp; } else { prov_cq->cq_cb.cq_comp = prov_cq_comp; } Collect the peer CQ callbacks in one structure for use in CXI. Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Restructure the code to allow for posting on the owner provider's shared receive queues. Do not do a reverse lookup on the AV table to get the fi_addr_t, instead register an address matching callback with the owner. The owner can then call the address matching callback to match an fi_addr_t to the source address in the message received. This is more efficient as the peer lookup can be an O(1) operation; AV[fi_addr_t]. The peer's CXI address can be compared with the CXI address in the message received. Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Upstream has a different method of registering SRX. There is a limitation where the SRX is only returned back in the upcall in get_tag/get_msg. But that prevents the parent provider of doing anything else with the peer callbacks. This presents a problem because we added a callback to register memory on the receive path. This patch updates the CXI provider Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Add memory registration callback to allow for parent provider, if one exists, to register receive buffers and not to wait until the data arrives before we can register the receive buffers. Signed-off-by: Amir Shehata <shehataa@ornl.gov>
amirshehataornl
force-pushed
the
08_lnx_cxi_updates
branch
from
May 16, 2024 13:40
279d9ab
to
b383885
Compare
amirshehataornl
changed the title
08 lnx cxi updates
08 of 09 LNX series - Add the peer infrastructure to the CXI provider
May 16, 2024
amirshehataornl
changed the title
08 of 09 LNX series - Add the peer infrastructure to the CXI provider
09 of 10 LNX series - Add the peer infrastructure to the CXI provider
May 16, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add CXI updates to support the peer infrastructure in preparation for using it with the new LINKx provider.
Signed-off-by: Amir Shehata shehataa@ornl.gov