-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add batched serial tbsv #2202
base: develop
Are you sure you want to change the base?
Add batched serial tbsv #2202
Conversation
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
fe210eb
to
fbeea38
Compare
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly small changes, this looks good overall
|
||
} // namespace KokkosBatched | ||
|
||
#endif // KOKKOSBATCHED_TBSV_SERIAL_IMPL_HPP_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a new line here.
std::string name = name_region + name_value_type; | ||
Kokkos::Profiling::pushRegion(name.c_str()); | ||
Kokkos::RangePolicy<execution_space, ParamTagType> policy(0, _b.extent(0)); | ||
Kokkos::parallel_for(name.c_str(), policy, *this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that you only need the region or the parallel_for to but not both in this case, not a problem though, just a comment
|
||
Kokkos::Random_XorShift64_Pool<execution_space> random(13718); | ||
Kokkos::fill_random(Ref, random, ScalarType(1.0)); | ||
Kokkos::fill_random(x0, random, ScalarType(1.0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not do exactly what you think in the case of complex numbers, we usually use KokkosKernels::Impl::getRandomBounds
from common/src/KokkosKernels_IOUtils.hpp
to generate inputs for random numbers
Functor_BatchedSerialTbsv<DeviceType, View3DType, View2DType, ParamTagType, | ||
AlgoTagType>(Ab, x1, k) | ||
.run(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe fence after launching the kernel to be sure it has completed by the time you perform the check? parallel_for is non-blocking in general
#include "KokkosBatched_Util.hpp" | ||
|
||
template <typename ExecutionSpace, typename AViewType, typename BViewType> | ||
bool allclose(const AViewType& a, const BViewType& b, double rtol = 1.e-5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you check in the test_common
and common
directories for these utilities? If we already have them there we don't want to re-implement this
Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED by label AT: PRE-TEST INSPECTED! Autotester is Removing Label; this inspection will remain valid until a new commit to source branch is performed. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA90A_ROCM561
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA90A_Tpls_ROCM561
Jenkins Parameters
Using Repos:
Pull Request Author: yasahi-hpc |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_GNU1021_Light_LayoutRight
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_GNU1021
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_Tpls_INTEL19_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_CLANG1001_solo
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA90A_ROCM561
Jenkins Parameters
Build InformationTest Name: KokkosKernels_PullRequest_VEGA90A_Tpls_ROCM561
Jenkins Parameters
Console Output (last 100 lines) : KokkosKernels_PullRequest_CUDA11_CUDA11_LayoutRight # 1375 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GCC930_Light_Tpls_GCC930_Tpls_CLANG13CUDA10 # 964 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GNU1021 # 621 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_GNU1021_Light_LayoutRight # 608 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_GNU1021 # 609 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_Tpls_INTEL19_solo # 613 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_CLANG1001_solo # 585 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_VEGA90A_ROCM561 # 1067 (click to expand)
Console Output (last 100 lines) : KokkosKernels_PullRequest_VEGA90A_Tpls_ROCM561 # 583 (click to expand)
|
Redo PR #2020 from develop branch.
This PR implements tbsv function.
Following files are added:
KokkosBatched_Tbsv_Serial_Impl.hpp
: Internal interfacesKokkosBatched_Tbsv_Serial_Internal.hpp
: Implementation detailsKokkosBatched_Tbsv.hpp
: APIsTest_Batched_SerialTbsv.hpp
: Unit tests for thatDetailed description
It solves the equation
Ax = b
.Here, the matrix has the following shape.
A
:(batch_count, lda, n)
n
byn
unit or non-unit, upper or lower triangular band matrix with(k+1)
diagonals.x
:(batch_count, n)
Before entry, the incremented array x must contain the n element right-hand side vector
b
.Example of a single batch of matrix A with
n = 10
andk = 3
.Parallelization would be made in the following manner. This is efficient only when
A is given in
LayoutLeft
for GPUs andLayoutRight
for CPUs (parallelized over batch direction).Tests
A
. Then, convert it to the banded storageAb
. SolvingAx = b
withtrsv
and(Ab)x = b
withtbsv
.tbsv
result is compared withtrsv
result.A
andb
and compute by handx
, then check that it matches with your routine.Tasks