New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternatives for non-standard sub_group::shuffle_up/down/xor and yet-to-be-implemented atomic_fence #767
Comments
1.) Yes, e.g. 2.) We could just implement it/you could submit a PR, it shouldn't be too difficult :-) We currently only support relaxed memory order on GPUs anyway, so there would not be a difference to a |
|
It invokes one of the There's no difference between the nvc++ path and the clang CUDA path. |
I'd have thought so, but it's actually invoking |
It should only execute Since Is something there not working? |
My bad...
so it must be both of them giving me trouble with the builtin undefined reference.... |
Thanks, the following snippet reproduces independently of hipSYCL with nvc++ 22.2:
(it happens with or without the EDIT: Seems it does not like the empty host path. Removing it or putting code inside seems to resolve the issue. |
Yes, with both 21.7 and 22.7.
Can also confirm your findings: emptying either |
Thank you, I have filed a bug report: https://forums.developer.nvidia.com/t/nvc-up-to-22-7-undefined-reference-to-builtin-is-device-code-for-empty-if-target-paths/223232 |
I also tried putting one empty statement, multiple empty statements, switching their order,... but to no avail. 🤪 |
Yeah, my experience is that nvc++ bugs are quite throrough ;) But thanks for trying. I guess as a hotfix we can just remove that empty statement for now: #798 |
NVIDIA says the issue is known, and should be fixed in future releases: https://forums.developer.nvidia.com/t/nvc-up-to-22-7-undefined-reference-to-builtin-is-device-code-for-empty-if-target-paths/223232/2 |
If they are anything like nvfortran's... 😝 |
Hi @illuhad,
I've been trying to compile a project written in SYCL which also uses a few DPC++ extensions.
I'm aware
sub_group::shuffle_up/down/xor
are not part of the standard, but I was wondering if I could emulate their behaviour with standard features currently implemented in hipSYCL. Do you have any ideas you'd be happy to share?Since
atomic_fence
hasn't yet been implemented, I've been thinking of biting the bullet and switching my code to usemem_fence
instead. In your experience, is this the best course of action or are there better alternatives?Thank you very much for your time,
-Nuno
The text was updated successfully, but these errors were encountered: