Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SSCP MUSA backend #1095

Open
wants to merge 37 commits into
base: develop
Choose a base branch
from
Open

Add SSCP MUSA backend #1095

wants to merge 37 commits into from

Conversation

fxzjshm
Copy link
Contributor

@fxzjshm fxzjshm commented Jul 31, 2023

This adds MUSA backend for Moore Threads GPU from @MooreThreads , using the SSCP codepath.

MUSA is another set of APIs similar to existing CUDA and HIP/ROCm ones; it also uses Clang/LLVM for code generation, but still incomplete and unfortunately closed-source.
It may possible to add the support without SSCP but since code of their compiler isn't available, I'm afraid it is nightmare to debug the plugin in such case...

The current SSCP MUSA implementation itself is somewhat a mixture of SSCP PTX and SSCP AMDGPU, with many details unclear now; the MUSA toolkit itself is also evolving quickly, so this backend is still working-in-progress; currently it "just works" for some workloads.

Disclaimer:

  • I am not affiliated with that company; brand names used are for identification only and does not imply affiliation.
  • I'm porting to this backend just for undergraduate project so many unused features are not tested.

Tested MUSA version: 1.3.1, 1.4 (1.3.0 not working)
Tested device: Moore Threads MTT S3000

Known limitations:

  • atomic things may not work, and will cause segmentation fault of compiler backend (1.3.1) or infinite error output (1.4)
  • half type not tested
  • ... many many others

@illuhad
Copy link
Collaborator

illuhad commented Aug 1, 2023

This is awesome to see!

Given that I don't know anybody here who has these GPUs available for testing and development, I have a couple of organizational questions:

  • Do you intend to finish the backend and implement the currently missing functionality?
  • Would you be willing to maintain this backend in the future?
  • In the long run, could you provide a GPU to hook into CI?

@fxzjshm
Copy link
Contributor Author

fxzjshm commented Aug 1, 2023

Thanks!

Do you intend to finish the backend and implement the currently missing functionality?

I will try to, however since my major isn't computer science, don't really think I can do this...
And I'm also waiting for vendor's compiler team, their own MUSA haven't implemented all functionalities.

This work is actually far from production level; this PR is just a flag that it is practical to extend SYCL to this vendor and maybe some other CUDA/HIP-like APIs, and passerby interested can try this backend, and/or make their backend for their own device so maybe SYCL will unify them together.

Would you be willing to maintain this backend in the future?

Yes, at least in the near future; but since I don't think this backend will be production-ready soon, I tend to make it live in this draft PR for now.

In the long run, could you provide a GPU to hook into CI?

Maybe hard to say... The current cards used are borrowed from another professor; I'm going to setup our own machine recently, and will ask my supervisors for this.

@biergaizi
Copy link

And I'm also waiting for vendor's compiler team, their own MUSA haven't implemented all functionalities.

Do you know if Moore Threads has any plan to implement SPIR-V for OpenCL? If SPIR-V is supported, these GPUs would enjoy automatic support of SYCL 2020. If not, unfortunately it means the proliferation of yet another GPU programming framework (albeit CUDA-like).

In one of Moore Threads's early promotional posters, I saw the mention "SYCL" as one of the supported framework along with OpenCL and Vulkan. But I'm not sure if SYCL meant the original SYCL 1.2 or SYCL 2020, which are completely different.

@fxzjshm
Copy link
Contributor Author

fxzjshm commented Nov 3, 2023

Seems not, last time I met them they said they will focus on MUSA and will not put much effort on other APIs. I asked SYCL but no direct response (so I said "never mind, I've done that.").

They even don't support device-only compilation and void* on device is incorrectly set to 32-bit (thus #1074); their device intrinsic names are also changed during 1.3 and 1.4 (from __mtml_* (like AMD's __ocml_*) to __mt_* (like NVIDIA's __nv_*)) (sorry, seems I forgot to push updates to this PR) and don't guarantee it will change or not in the future. The only "supported" way now is to use their mcc compiler (which is similar to nvcc & hipcc).

Moreover, some unnamed sources say they are focusing on AI things for MUSA 1.5, so I don't think they have enough human resources for HPC...

PS: About inactivity of this PR, our group is now focusing on construction of a telescope and is short of servers, so I'm still using cards from another professor and unable to set up a CI; and this semester is so tiring that I have little time on this PR... hope next semester will be easier.

Related commit:
326b57a ("[OpenCL] Handle synchronization between queues from different platforms"),
5f22d60 (" Add OpenCL prefetch support and make stdpar prefetch bypass all SYCL layers")
They are not responding to the request of providing libLLVM.so,
build without it for now.
Related commit: a54d87b ("add clz builtin")
This fixes subgroup-related tests on mp_21
Target triple, annontation & intrinsic names have (again) changed.
JIT commands now compile for available device.
Debug info is now removed as their compiler still cannot handle it.

Intrinsics like `llvm.musa.atomic.exch.gen.i.sys` still crashes compiler, but it will be ok if unused in kernel.
@fxzjshm
Copy link
Contributor Author

fxzjshm commented Jan 27, 2024

Current status:

  • Tests not passed:
    • atomic related (compiler crash)
    • custom_pfwi_synchronization_extension
    • scoped_parallelism_api
    • group_functions_*
    • hierarchical_dispatch
    • marray_tests/marray_ops (numerical error: 0.428571463 != 0.428571433; also seen in their FFT library)
    • usm_tests/prefetch (ERROR_NOT_SUPPORTED)
  • cannot rely on libLLVM.so as they don't provide it
  • device info may change in future generations, like subgroup max size
  • still no suitable machine for our card, so still no CI now...

This backend now quite works on my project, so maybe worth a try.

Signed-off-by: fxzjshm <fxzjshm@163.com>
Signed-off-by: fxzjshm <fxzjshm@163.com>
Signed-off-by: fxzjshm <fxzjshm@163.com>
Signed-off-by: fxzjshm <fxzjshm@163.com>
@fxzjshm fxzjshm marked this pull request as ready for review March 12, 2024 16:51
@fxzjshm fxzjshm changed the title [draft] add SSCP MUSA backend Add SSCP MUSA backend Mar 29, 2024
@fxzjshm
Copy link
Contributor Author

fxzjshm commented Mar 29, 2024

@illuhad I think this backend is now ready for review, could you please take a look? Thanks.

@illuhad
Copy link
Collaborator

illuhad commented Apr 6, 2024

Just a quick update, I have not forgotten about you, but I am travelling and don't have the bandwidth to review such a large PR at the moment.

So your intent is to have this merged, and then support it upstream? We would need some form of CI as a prerequisite, otherwise the code is probably just going to break more and more over time. I understand that providing some actual GPU CI can be difficult, but at least testing whether the MUSA runtime backend compiles should be easily possible in the github runners, right? Or is the SDK not publicly available?

@fxzjshm
Copy link
Contributor Author

fxzjshm commented Apr 7, 2024

So your intent is to have this merged, and then support it upstream?

I think it can make AdaptiveCpp more "Adaptive", does it? Or if you consider maintaining this backend downstream better, I will just do that.

is the SDK not publicly available?

Their SDK is available at https://developer.mthreads.com/sdk/download/musa (currently in Chinese only, I think they are not targeting global users right now). In fact, only after they released first public SDK did I dare to file this Pull Request.

at least testing whether the MUSA runtime backend compiles should be easily possible in the github runners

If you mean compiling test code is enough for now, I will try that. I've used CI before but not Github runners, hope this won't take too much time...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants