-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce KOKKOS_DEDUCTION_GUIDE
macro to allow user-defined deduction guide in device code for clang compiler
#6954
Introduce KOKKOS_DEDUCTION_GUIDE
macro to allow user-defined deduction guide in device code for clang compiler
#6954
Conversation
KOKKOS_DEDUCTION_GUIDE
macro to suppress warnings about user-defined deduction guide used in device code in HIP
@@ -27,6 +27,8 @@ | |||
#define KOKKOS_LAMBDA [=] __host__ __device__ | |||
#define KOKKOS_CLASS_LAMBDA [ =, *this ] __host__ __device__ | |||
|
|||
#define KOKKOS_DEDUCTION_GUIDE __host__ __device__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation here is the same as for KOKKOS_FUNCTION
and the deduction guides changed here are for KOKKOS_FUNCTION
s anyway. In my opinion, using KOKKOS_FUNCTION
instead of introducing a new macro matches more closely the intent of deduction guides (by replicating the constructor's signature).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your point, just wondering if annotating something that is not a function by KOKKOS_FUNCTION
would be confusing ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just follows the syntax for deduction guides so it's as confusing and more consistent.
edit: fixed typo syntax vs semantics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we should limit the number of (public) macros we define since they are a pain to deal with later if we consider removing them since we can't deprecate them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, using KOKKOS_FUNCTION instead of introducing a new macro matches more closely the intent of deduction guides (by replicating the constructor's signature).
I agree with @masterleinad on this.
I don't really see how introducing a new macro improves the code. Worse it makes a bug more likely. Both this PR and mdspan fix the issue for HIP but not for CUDA clang...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe though to assume that compilers need the same annotations between deduction guides and constructors ?
That's what we are observing, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far yes i agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just follows the semantics for deduction guides
I don't understand that argument. Did you mean syntax? Please clarify.
To the consistency argument, I would like to point out there is prior art in Kokkos, with KOKKOS_DEFAULTED_FUNCTION
and KOKKOS_INLINE_DELETED_FUNCTION
(Blame game for naming that `KOKKOS_INLINE_DELETED_FUNCTION` macro)
https://github.com/kokkos/kokkos/commit/3a439317142573f0f6b39e2cd61934fb5e56ca72You could argue that these are not needed any more and we are stuck with them, but I also want to look back and see that these were also special syntaxes for the compiler for which it was also not clear what to do about __host__ __device__
annotations (especially for deleted functions!). Experience has shown that compilers behavior changed and we had to discriminate workarounds based on vendor toolchain versions.
If we use KOKKOS_FUNCTION
to decorate our deduction guides, teach our users to do so, and there is even only one compiler that we need to support and that does not like the host device annotation, then we will have to do the distinction. We will be able to fix it in our codebase but it will be harder to propagate downstream.
That said I have no evidence it will break and I haven't surveyed what currently works.
Worse it makes a bug more likely. Both this PR and mdspan fix the issue for HIP but not for CUDA clang...
I don't see how it makes it more bug prone. Please elaborate.
Regarding Clang+CUDA, did you check that we see the issue when the annotation are missing. I would be surprised if that was not considered in the reference implementation of mdspan
. That is possible but I expect someone would have asked the question. @crtrott do you remember?
I don't like the proliferation of public macros either but I tend to think it is justified in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how it makes it more bug prone. Please elaborate.
Different people wrote different versions of this PR and they both forgot to fix CUDA clang. If you didn't introduce the new macro and instead use KOKKOS_FUNCTION
, the PR would have fixed the issue.
Regarding Clang+CUDA, did you check that we see the issue when the annotation are missing
If you look at the godbolt code that @tpadioleau posted. You can see that he is using CUDA clang to reproduce the error not HIP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding Clang+CUDA, did you check that we see the issue when the annotation are missing
If you look at the godbolt code that @tpadioleau posted. You can see that he is using CUDA clang to reproduce the error not HIP.
Haha damn. Missed the compiler explorer link, I was on my cell.
@dalg24 Note that it is not just a warning but an error, I have updated the compiler message. |
Ok, plz update title accordingly
…On Thu, Apr 18, 2024 at 9:34 AM Thomas Padioleau ***@***.***> wrote:
@dalg24 <https://github.com/dalg24> Note that it is not just a warning
but an error
—
Reply to this email directly, view it on GitHub
<#6954 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACIQEVONV6CBFLKMQJ7CXLY57DWJAVCNFSM6AAAAABGNGSJZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRTHA4DENRQGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
KOKKOS_DEDUCTION_GUIDE
macro to suppress warnings about user-defined deduction guide used in device code in HIPKOKKOS_DEDUCTION_GUIDE
macro to allow user-defined deduction guide in device code in HIP
Back in draft the time to add tests |
I believe a new macro is warranted. NVCC currently does not require the markup on constructors (though it works with them too). But my experience with markup on defaulted ctors is a thing which I believe warrants caution here. NVCC changed its requirements regarding requiring/allowing markup on defaulted functions over time. And generally speaking: deduction guides are NOT functions. In fact the signatures don't even need to match the signatures of any existing CTOR. E.g. look at this: https://godbolt.org/z/fYPhK5Ec5
The deduction guide looks nothing like the ctor, but it will work as long as you try to construct from something convertible to float. So it's not a function: it's a rule. Introducing a new macro for that seems warranted to me. |
In that case, I agree that we should use two different macros. |
40f8bed
to
da61c12
Compare
KOKKOS_DEDUCTION_GUIDE
macro to allow user-defined deduction guide in device code in HIPKOKKOS_DEDUCTION_GUIDE
macro to allow user-defined deduction guide in device code for clang compiler
There is a real issue with OpenMPTarget @rgayatri23 can you take a look too? |
@tpadioleau the OpenMPTarget failure looks real. Not sure when @rgayatri23 will have time so if you can track it down that would be great. Possibly we can just disable the test for now for OpenMPTarget. |
using ReducerTypeExpected = | ||
Kokkos::MaxFirstLocCustomComparator<ScalarType, IndexType, | ||
FakeComparator, MemorySpace>; | ||
using ValueType = ReducerTypeExpected::value_type; | ||
Kokkos::View<ValueType, MemorySpace> view; | ||
FakeComparator comparator; | ||
Kokkos::MaxFirstLocCustomComparator reducer(view, comparator); | ||
check_types<ReducerTypeExpected>(team_handle, reducer); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenMPTarget only supports
struct OpenMPTargetReducerWrapper<Sum<Scalar, Space>> {
struct OpenMPTargetReducerWrapper<Prod<Scalar, Space>> {
struct OpenMPTargetReducerWrapper<Min<Scalar, Space>> {
struct OpenMPTargetReducerWrapper<Max<Scalar, Space>> {
struct OpenMPTargetReducerWrapper<LAnd<Scalar, Space>> {
struct OpenMPTargetReducerWrapper<LOr<Scalar, Space>> {
struct OpenMPTargetReducerWrapper<BAnd<Scalar, Space>> {
struct OpenMPTargetReducerWrapper<BOr<Scalar, Space>> {
struct OpenMPTargetReducerWrapper<MinLoc<Scalar, Index, Space>> {
struct OpenMPTargetReducerWrapper<MaxLoc<Scalar, Index, Space>> {
struct OpenMPTargetReducerWrapper<MinMax<Scalar, Space>> {
struct OpenMPTargetReducerWrapper<MinMaxLoc<Scalar, Index, Space>> {
struct OpenMPTargetReducerWrapper<MaxFirstLoc<Scalar, Index, Space>> {
struct OpenMPTargetReducerWrapper<MinFirstLoc<Scalar, Index, Space>> {
struct OpenMPTargetReducerWrapper<MinMaxFirstLastLoc<Scalar, Index, Space>> {
struct OpenMPTargetReducerWrapper<FirstLoc<Index, Space>> {
struct OpenMPTargetReducerWrapper<LastLoc<Index, Space>> {
struct OpenMPTargetReducerWrapper<StdIsPartitioned<Index, Space>> {
struct OpenMPTargetReducerWrapper<StdPartitionPoint<Index, Space>> {
at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we will have to implement MaxFirstLocCustomComparator
for the OpenMPTarget backend.
@tpadioleau - Can you disable that test for the OpenMPTarget backend for now? |
Done in d954dc4 I also notice a failure on the OpenACC backend, should I disable it ? |
core/src/setup/Kokkos_Setup_Cuda.hpp
Outdated
#if defined(__clang__) | ||
#define KOKKOS_DEDUCTION_GUIDE __host__ __device__ | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer if we removed the guard. nvcc
seems to be fine with it as well, see https://godbolt.org/z/fY1T7TE6T.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with both solutions though this introduces a change for a working compiler.
The errors look like /var/jenkins/workspace/Kokkos_PR-6954/core/src/OpenACC/Kokkos_OpenACC_ParallelReduce_Team.hpp", line 242: error: no instance of constructor "Kokkos::Sum<Scalar, Space>::Sum [with Scalar=<unnamed>::TestNestedReducerCTAD::ScalarType, Space=<unnamed>::TestNestedReducerCTAD::MemorySpace]" matches the argument list
ValueType tmp = ValueType();
^
detected during:
instantiation of "void Kokkos::parallel_reduce(const Kokkos::Impl::TeamVectorRangeBoundariesStruct<iType, Kokkos::Impl::OpenACCTeamMember> &, const Lambda &, ValueType &) [with iType=int, Lambda=<unnamed>::TestNestedReducerCTAD::FakeFunctor<int>, ValueType=const Kokkos::Sum<<unnamed>::TestNestedReducerCTAD::ScalarType, <unnamed>::TestNestedReducerCTAD::MemorySpace>]" at line 46 of "/var/jenkins/workspace/Kokkos_PR-6954/core/unit_test/TestNestedReducerCTAD.cpp"
instantiation of "void <unnamed>::TestNestedReducerCTAD::check_types<ReducerTypeExpected,TeamHandle,ReducerTypeToCheck>(const TeamHandle &, const ReducerTypeToCheck &) [with ReducerTypeExpected=Kokkos::Sum<<unnamed>::TestNestedReducerCTAD::ScalarType, <unnamed>::TestNestedReducerCTAD::MemorySpace>, TeamHandle=<unnamed>::TestNestedReducerCTAD::TeamHandle, ReducerTypeToCheck=Kokkos::Sum<<unnamed>::TestNestedReducerCTAD::ScalarType, <unnamed>::TestNestedReducerCTAD::MemorySpace>]" at line 55 of "/var/jenkins/workspace/Kokkos_PR-6954/core/unit_test/TestNestedReducerCTAD.cpp" and we have to do something about it. If we can't figure it out, we could consider disabling the test for |
@masterleinad @tpadioleau I think the error in #6954 (comment) may be a legitimate bug unrelated to deduction guides.
Contrast this with the (I'm approving the P/R as I don't think the error is related to deduction guides, but it still needs to be addressed before merging.) |
I was sure that it was unrelated to this pull request. 🙂 We still have to fix the CI by disabling the test with a FIXME for OpenACC. |
@masterleinad @nliber Sorry for my late reply and thank you for looking at the issue. In the meantime the OpenACC issue is solved I could have a look at how to slightly change the test. I have in mind to remove the inner parallel_reduce. I know that I could not remove the outer parallel_for without breaking the test. |
I don't think anyone is working on a fix for the OpenACC issue. |
Co-authored-by: Damien L-G <dalg24+github@gmail.com>
ca5a08a
to
99134c6
Compare
2f9e42a
to
2910db4
Compare
We recently came across a compiler error when using the deduction guide of
TeamThreadMDRange
on a HIP compilation errorI could reproduce it on godbolt https://godbolt.org/z/GTdeY77KT. My understanding is that clang requires the user-defined deduction guide to be annotated to be used inside a device function.
Based on the suggestions on slack, this PR annotates the user-defined deduction guides using a new macro
KOKKOS_DEDUCTION_GUIDE
. I have used this macro to annotate guides associated to constructors annotatedKOKKOS_FUNCTION
.Should we introduce tests for that ?
cc @etiennemlb