New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SSCP MUSA backend #1095
base: develop
Are you sure you want to change the base?
Add SSCP MUSA backend #1095
Conversation
now can run simple tests, but code with external function still WIP
* builtin `__mtml_` -> `__mt_` * `__MTGPU__` required for `llvm::CallingConv::MTGPU_KERNEL` * data layout changed * `__nvvm_bar_warp_sync` no longer available * use their arch `mp_10`
related to caee556 ("Use fixed width int types in SSCP builtin interface")
This is awesome to see! Given that I don't know anybody here who has these GPUs available for testing and development, I have a couple of organizational questions:
|
Thanks!
I will try to, however since my major isn't computer science, don't really think I can do this... This work is actually far from production level; this PR is just a flag that it is practical to extend SYCL to this vendor and maybe some other CUDA/HIP-like APIs, and passerby interested can try this backend, and/or make their backend for their own device so maybe SYCL will unify them together.
Yes, at least in the near future; but since I don't think this backend will be production-ready soon, I tend to make it live in this draft PR for now.
Maybe hard to say... The current cards used are borrowed from another professor; I'm going to setup our own machine recently, and will ask my supervisors for this. |
Do you know if Moore Threads has any plan to implement SPIR-V for OpenCL? If SPIR-V is supported, these GPUs would enjoy automatic support of SYCL 2020. If not, unfortunately it means the proliferation of yet another GPU programming framework (albeit CUDA-like). In one of Moore Threads's early promotional posters, I saw the mention "SYCL" as one of the supported framework along with OpenCL and Vulkan. But I'm not sure if SYCL meant the original SYCL 1.2 or SYCL 2020, which are completely different. |
Seems not, last time I met them they said they will focus on MUSA and will not put much effort on other APIs. I asked SYCL but no direct response (so I said "never mind, I've done that."). They even don't support device-only compilation and Moreover, some unnamed sources say they are focusing on AI things for MUSA 1.5, so I don't think they have enough human resources for HPC... PS: About inactivity of this PR, our group is now focusing on construction of a telescope and is short of servers, so I'm still using cards from another professor and unable to set up a CI; and this semester is so tiring that I have little time on this PR... hope next semester will be easier. |
They are not responding to the request of providing libLLVM.so, build without it for now.
Related commit: a54d87b ("add clz builtin")
This fixes subgroup-related tests on mp_21
Target triple, annontation & intrinsic names have (again) changed. JIT commands now compile for available device. Debug info is now removed as their compiler still cannot handle it. Intrinsics like `llvm.musa.atomic.exch.gen.i.sys` still crashes compiler, but it will be ok if unused in kernel.
Current status:
This backend now quite works on my project, so maybe worth a try. |
Signed-off-by: fxzjshm <fxzjshm@163.com>
Signed-off-by: fxzjshm <fxzjshm@163.com>
Signed-off-by: fxzjshm <fxzjshm@163.com>
Signed-off-by: fxzjshm <fxzjshm@163.com>
@illuhad I think this backend is now ready for review, could you please take a look? Thanks. |
Just a quick update, I have not forgotten about you, but I am travelling and don't have the bandwidth to review such a large PR at the moment. So your intent is to have this merged, and then support it upstream? We would need some form of CI as a prerequisite, otherwise the code is probably just going to break more and more over time. I understand that providing some actual GPU CI can be difficult, but at least testing whether the MUSA runtime backend compiles should be easily possible in the github runners, right? Or is the SDK not publicly available? |
I think it can make AdaptiveCpp more "Adaptive", does it? Or if you consider maintaining this backend downstream better, I will just do that.
Their SDK is available at https://developer.mthreads.com/sdk/download/musa (currently in Chinese only, I think they are not targeting global users right now). In fact, only after they released first public SDK did I dare to file this Pull Request.
If you mean compiling test code is enough for now, I will try that. I've used CI before but not Github runners, hope this won't take too much time... |
This adds MUSA backend for Moore Threads GPU from @MooreThreads , using the SSCP codepath.
MUSA is another set of APIs similar to existing CUDA and HIP/ROCm ones; it also uses Clang/LLVM for code generation, but still incomplete and unfortunately closed-source.
It may possible to add the support without SSCP but since code of their compiler isn't available, I'm afraid it is nightmare to debug the plugin in such case...
The current SSCP MUSA implementation itself is somewhat a mixture of SSCP PTX and SSCP AMDGPU, with many details unclear now; the MUSA toolkit itself is also evolving quickly, so this backend is still working-in-progress; currently it "just works" for some workloads.
Disclaimer:
Tested MUSA version: 1.3.1, 1.4 (1.3.0 not working)
Tested device: Moore Threads MTT S3000
Known limitations:
atomic
things may not work, and will cause segmentation fault of compiler backend (1.3.1) or infinite error output (1.4)half
type not tested