[Storage_ipc] Option II: Provides IPC extensions for 3rd devices. #126373

mengpenghui · 2024-05-16T03:30:51Z

I am working on IPC-related interfaces to support third-party devices.
This is another option for IPC support for third party devices.

The share_cuda_ interface is changed to share_device_
The backend uses getAccelerator to determine the device and distribute different devices.
Add the share_device implementation of third-party devices to PrivateUse1HooksInterface

In this way, third-party devices and CUDA will use the same interface, and the device distinction is hidden in the backend. Users also need to manually set the device type (although these are internal interfaces).

another plan: #125122

Do you think which is feasible? Looking forward to your suggestions
Fixes #124902

cc @albanD

pytorch-bot · 2024-05-16T03:30:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126373

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b7ead53 with merge base b24ad7e ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/csrc/StorageSharing.cpp

…orch#125122

albanD

Thanks for the PR, that sounds good in principle but I think the implementation needs a bit of work for two things:

the aten level hooks should not take as input and return PyObject*. This is the c++ layer and should not deal with arg-parsing in there. I think in this case, you want to refactor the arg-parsing from the cuda code to be shared. And then call into a c++ only hook.
The hook should be generic, I think the AcceleratorHookInterface is the right place to do that. This way it can be shared by CUDA/PrivateUse1/others
The python layer should be generic using acceleratorhooks so that you don't need if/else for each device.

albanD · 2024-05-24T18:47:52Z

c10/core/StorageImpl.h

@@ -294,7 +294,7 @@ struct C10_API StorageImpl : public c10::intrusive_ptr_target {
  bool resizable_;
  // Identifies that Storage was received from another process and doesn't have
  // local to process cuda memory allocation
-  bool received_cuda_;
+  bool received_device_;


Comment on the line above needs updating as well

mengpenghui requested a review from mikaylagawarecki as a code owner May 16, 2024 03:30

pytorchbot added the open source label May 16, 2024

legionGIT reviewed May 20, 2024

View reviewed changes

torch/csrc/StorageSharing.cpp Outdated Show resolved Hide resolved

[Storage_ipc] Option II: Provides IPC extensions for 3rd devices. pyt…

b7ead53

…orch#125122

mengpenghui force-pushed the ipc_without_device branch from 33bdf3c to b7ead53 Compare May 20, 2024 04:25

drisspg added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 20, 2024

mikaylagawarecki requested a review from albanD May 24, 2024 15:40

albanD reviewed May 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Storage_ipc] Option II: Provides IPC extensions for 3rd devices. #126373

[Storage_ipc] Option II: Provides IPC extensions for 3rd devices. #126373

mengpenghui commented May 16, 2024

pytorch-bot bot commented May 16, 2024 •

edited

albanD left a comment

albanD May 24, 2024

[Storage_ipc] Option II: Provides IPC extensions for 3rd devices. #126373

Are you sure you want to change the base?

[Storage_ipc] Option II: Provides IPC extensions for 3rd devices. #126373

Conversation

mengpenghui commented May 16, 2024

pytorch-bot bot commented May 16, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126373

✅ No Failures

albanD left a comment

Choose a reason for hiding this comment

albanD May 24, 2024

Choose a reason for hiding this comment

pytorch-bot bot commented May 16, 2024 •

edited