Add SSCP MUSA backend #1095

fxzjshm · 2023-07-31T08:28:54Z

This adds MUSA backend for Moore Threads GPU from @MooreThreads , using the SSCP codepath.

MUSA is another set of APIs similar to existing CUDA and HIP/ROCm ones; it also uses Clang/LLVM for code generation, but still incomplete and unfortunately closed-source.
It may possible to add the support without SSCP but since code of their compiler isn't available, I'm afraid it is nightmare to debug the plugin in such case...

The current SSCP MUSA implementation itself is somewhat a mixture of SSCP PTX and SSCP AMDGPU, with many details unclear now; the MUSA toolkit itself is also evolving quickly, so this backend is still working-in-progress; currently it "just works" for some workloads.

Disclaimer:

I am not affiliated with that company; brand names used are for identification only and does not imply affiliation.
I'm porting to this backend just for undergraduate project so many unused features are not tested.

Tested MUSA version: 1.3.1, 1.4 (1.3.0 not working)
Tested device: Moore Threads MTT S3000

Known limitations:

atomic things may not work, and will cause segmentation fault of compiler backend (1.3.1) or infinite error output (1.4)
half type not tested
... many many others

now can run simple tests, but code with external function still WIP

* builtin `__mtml_` -> `__mt_` * `__MTGPU__` required for `llvm::CallingConv::MTGPU_KERNEL` * data layout changed * `__nvvm_bar_warp_sync` no longer available * use their arch `mp_10`

related to caee556 ("Use fixed width int types in SSCP builtin interface")

illuhad · 2023-08-01T10:56:13Z

This is awesome to see!

Given that I don't know anybody here who has these GPUs available for testing and development, I have a couple of organizational questions:

Do you intend to finish the backend and implement the currently missing functionality?
Would you be willing to maintain this backend in the future?
In the long run, could you provide a GPU to hook into CI?

fxzjshm · 2023-08-01T17:04:17Z

Thanks!

Do you intend to finish the backend and implement the currently missing functionality?

I will try to, however since my major isn't computer science, don't really think I can do this...
And I'm also waiting for vendor's compiler team, their own MUSA haven't implemented all functionalities.

This work is actually far from production level; this PR is just a flag that it is practical to extend SYCL to this vendor and maybe some other CUDA/HIP-like APIs, and passerby interested can try this backend, and/or make their backend for their own device so maybe SYCL will unify them together.

Would you be willing to maintain this backend in the future?

Yes, at least in the near future; but since I don't think this backend will be production-ready soon, I tend to make it live in this draft PR for now.

In the long run, could you provide a GPU to hook into CI?

Maybe hard to say... The current cards used are borrowed from another professor; I'm going to setup our own machine recently, and will ask my supervisors for this.

biergaizi · 2023-11-03T12:29:04Z

And I'm also waiting for vendor's compiler team, their own MUSA haven't implemented all functionalities.

Do you know if Moore Threads has any plan to implement SPIR-V for OpenCL? If SPIR-V is supported, these GPUs would enjoy automatic support of SYCL 2020. If not, unfortunately it means the proliferation of yet another GPU programming framework (albeit CUDA-like).

In one of Moore Threads's early promotional posters, I saw the mention "SYCL" as one of the supported framework along with OpenCL and Vulkan. But I'm not sure if SYCL meant the original SYCL 1.2 or SYCL 2020, which are completely different.

fxzjshm · 2023-11-03T13:55:26Z

Seems not, last time I met them they said they will focus on MUSA and will not put much effort on other APIs. I asked SYCL but no direct response (so I said "never mind, I've done that.").

They even don't support device-only compilation and void* on device is incorrectly set to 32-bit (thus #1074); their device intrinsic names are also changed during 1.3 and 1.4 (from __mtml_* (like AMD's __ocml_*) to __mt_* (like NVIDIA's __nv_*)) (sorry, seems I forgot to push updates to this PR) and don't guarantee it will change or not in the future. The only "supported" way now is to use their mcc compiler (which is similar to nvcc & hipcc).

Moreover, some unnamed sources say they are focusing on AI things for MUSA 1.5, so I don't think they have enough human resources for HPC...

PS: About inactivity of this PR, our group is now focusing on construction of a telescope and is short of servers, so I'm still using cards from another professor and unable to set up a CI; and this semester is so tiring that I have little time on this PR... hope next semester will be easier.

Related commit: 326b57a ("[OpenCL] Handle synchronization between queues from different platforms"), 5f22d60 (" Add OpenCL prefetch support and make stdpar prefetch bypass all SYCL layers")

They are not responding to the request of providing libLLVM.so, build without it for now.

Related commit: a54d87b ("add clz builtin")

This fixes subgroup-related tests on mp_21

Target triple, annontation & intrinsic names have (again) changed. JIT commands now compile for available device. Debug info is now removed as their compiler still cannot handle it. Intrinsics like `llvm.musa.atomic.exch.gen.i.sys` still crashes compiler, but it will be ok if unused in kernel.

fxzjshm · 2024-01-27T16:21:16Z

Current status:

Tests not passed:
- atomic related (compiler crash)
- custom_pfwi_synchronization_extension
- scoped_parallelism_api
- group_functions_*
- hierarchical_dispatch
- marray_tests/marray_ops (numerical error: 0.428571463 != 0.428571433; also seen in their FFT library)
- usm_tests/prefetch (ERROR_NOT_SUPPORTED)
cannot rely on libLLVM.so as they don't provide it
device info may change in future generations, like subgroup max size
still no suitable machine for our card, so still no CI now...

This backend now quite works on my project, so maybe worth a try.

Signed-off-by: fxzjshm <fxzjshm@163.com>

fxzjshm · 2024-03-29T05:38:47Z

@illuhad I think this backend is now ready for review, could you please take a look? Thanks.

illuhad · 2024-04-06T22:30:32Z

Just a quick update, I have not forgotten about you, but I am travelling and don't have the bandwidth to review such a large PR at the moment.

So your intent is to have this merged, and then support it upstream? We would need some form of CI as a prerequisite, otherwise the code is probably just going to break more and more over time. I understand that providing some actual GPU CI can be difficult, but at least testing whether the MUSA runtime backend compiles should be easily possible in the github runners, right? Or is the SDK not publicly available?

fxzjshm · 2024-04-07T15:41:32Z

So your intent is to have this merged, and then support it upstream?

I think it can make AdaptiveCpp more "Adaptive", does it? Or if you consider maintaining this backend downstream better, I will just do that.

is the SDK not publicly available?

Their SDK is available at https://developer.mthreads.com/sdk/download/musa (currently in Chinese only, I think they are not targeting global users right now). In fact, only after they released first public SDK did I dare to file this Pull Request.

at least testing whether the MUSA runtime backend compiles should be easily possible in the github runners

If you mean compiling test code is enough for now, I will try that. I've used CI before but not Github runners, hope this won't take too much time...

fxzjshm added 22 commits July 31, 2023 10:57

[sscp] add initial support for musa

fc7bd23

[sscp] runtime for musa, cannot run now

940f17a

[sscp] musa: migrate more builtins

c5a3faf

[sscp] musa: correct module load

a384b7a

[sscp] musa: set calling conv & correct addrspace map

96b3295

now can run simple tests, but code with external function still WIP

[sscp] musa: fix header macro

d0740c5

[sscp] musa: fix target name

a24cb18

[sscp] musa: fix intrinsic name

989f69d

[sscp] musa: disable mtgpu internalize symbols

aa92851

[sscp] musa: correct intrinsic name

e345c02

[sscp] musa: fix vendor id

03f140d

[sscp] musa: enable musa backend

001c873

[sscp] musa: handle device_uint_property::needs_dimension_flip

01509f3

[sscp] musa: add missing math intrinsic

7afe0f5

[sscp] musa: throw error in CMake if MUSA enabled but not found

bba9a66

[sscp] musa: sync changes for MUSA 1.3.1

1599516

* builtin `__mtml_` -> `__mt_` * `__MTGPU__` required for `llvm::CallingConv::MTGPU_KERNEL` * data layout changed * `__nvvm_bar_warp_sync` no longer available * use their arch `mp_10`

[sscp] musa: add backend interop

10af357

[sscp] musa: add always inliner

2dcafb9

[sscp] musa: fix int width in builtin interface

a81c6f8

related to caee556 ("Use fixed width int types in SSCP builtin interface")

[sscp] musa: tentatively re-enable some features

e8eb625

[sscp] musa: correct arch info

08e47d0

[sscp] musa: remove temporary debug code

886ecc9

fxzjshm added 4 commits January 12, 2024 05:54

Merge branch 'develop' into sscp-musa

ffbdb96

[SSCP] musa: track upstream changes in runtime/*_queue.hpp & *_queue.cpp

3aeb34e

Related commit: 326b57a ("[OpenCL] Handle synchronization between queues from different platforms"), 5f22d60 (" Add OpenCL prefetch support and make stdpar prefetch bypass all SYCL layers")

[SSCP] musa: enable SSCP if with MUSA backend

5e946e9

[SSCP] musa: do not require shared libLLVM.so if building with MUSA

0f2c6b2

They are not responding to the request of providing libLLVM.so, build without it for now.

fxzjshm added 6 commits January 27, 2024 15:41

[SSCP] musa: add clz builtin

c9f900b

Related commit: a54d87b ("add clz builtin")

[SSCP] musa: add device visibility mask support

097cef2

[SSCP] musa: fix subgroup max size

3c744d1

This fixes subgroup-related tests on mp_21

[SSCP] musa: link acpp-rt

98f6510

[SSCP] musa: support float16

8654b72

fxzjshm added 4 commits March 12, 2024 09:33

[SSCP] musa: merge branch 'develop'

dbf69c8

Signed-off-by: fxzjshm <fxzjshm@163.com>

[SSCP] musa: merge upstream changes

5f11530

Signed-off-by: fxzjshm <fxzjshm@163.com>

[SSCP] musa: correcly pass target arch to llvm-to-musa

73e2913

Signed-off-by: fxzjshm <fxzjshm@163.com>

[SSCP] musa: add popcount

40aa462

Signed-off-by: fxzjshm <fxzjshm@163.com>

fxzjshm marked this pull request as ready for review March 12, 2024 16:51

Merge branch 'develop' into sscp-musa

cb03a73

fxzjshm changed the title ~~[draft] add SSCP MUSA backend~~ Add SSCP MUSA backend Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SSCP MUSA backend #1095

Add SSCP MUSA backend #1095

fxzjshm commented Jul 31, 2023

illuhad commented Aug 1, 2023

fxzjshm commented Aug 1, 2023

biergaizi commented Nov 3, 2023

fxzjshm commented Nov 3, 2023

fxzjshm commented Jan 27, 2024

fxzjshm commented Mar 29, 2024

illuhad commented Apr 6, 2024

fxzjshm commented Apr 7, 2024

Add SSCP MUSA backend #1095

Are you sure you want to change the base?

Add SSCP MUSA backend #1095

Conversation

fxzjshm commented Jul 31, 2023

illuhad commented Aug 1, 2023

fxzjshm commented Aug 1, 2023

biergaizi commented Nov 3, 2023

fxzjshm commented Nov 3, 2023

fxzjshm commented Jan 27, 2024

fxzjshm commented Mar 29, 2024

illuhad commented Apr 6, 2024

fxzjshm commented Apr 7, 2024