Implement tunings with the same name the same way between Base and RAJA variants #397

MrBurmark · 2023-12-04T23:20:25Z

There are some kernels where tunings with the same name are not implemented the same way between Base and RAJA variants. Work on the implementations to make them the same or add tunings to have an apples to apples comparison between Base and RAJA variants.

Affected kernels/algorithms:

INDEXLIST_3LOOP - Base variants read outcomes of scans but RAJA variants use reductions Make INDEXLIST_3LOOP implementations consistent #370
Reducers - Base reducers do a block reduction then an atomic per block to finalize the reduction but RAJA reducers do a block reduction then the last block finalizes the reduction Add RAJA GPU block atomic Tuning for Reduction Kernels #393
LCALS_FIRST_MIN - Base reducers are finalized on the host but RAJA reducers are finalized in the last block Fix FIRST_MIN GPU reduction implementation #398
Reducers - Base reducers block atomics are into a contiguous buffer so have false sharing but RAJA reducers block atomics are into different buffers so they may avoid false sharing
Reducers - Base reducers use device memory and explicit memory copies but RAJA reducers use pinned memory Fix Base GPU Variants Reducer Memory Usage #392
MEMSET/MEMCPY - Base used stream 0 but RAJA used a different stream Use same GPU stream for all kernels #296
HALOEXCHANGE_FUSED - Base uses direct dispatch but RAJA uses indirect function call dispatch HALOEXCHANGE_FUSED WorkGroup Dispatch Policies #260

Other things affecting performance:

HALOEXCHANGE_FUSED - RAJA variants have dynamic scratch memory usage, lower hipLimitStackSize or set env HSA_SCRATCH_SINGLE_LIMIT=240000000 (MI250X) to avoid dynamic scratch memory allocation
Reducers - RAJA variants don't always inline, use compiler flags from hipcc (-mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false) or increase inline threshold (-fgpu-inline-threshold=100000)

The text was updated successfully, but these errors were encountered:

MrBurmark self-assigned this Dec 5, 2023

MrBurmark added enhancement cuda hip labels Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement tunings with the same name the same way between Base and RAJA variants #397

Implement tunings with the same name the same way between Base and RAJA variants #397

MrBurmark commented Dec 4, 2023 •

edited

Implement tunings with the same name the same way between Base and RAJA variants #397

Implement tunings with the same name the same way between Base and RAJA variants #397

Comments

MrBurmark commented Dec 4, 2023 • edited

MrBurmark commented Dec 4, 2023 •

edited