Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement tunings with the same name the same way between Base and RAJA variants #397

Open
6 of 7 tasks
MrBurmark opened this issue Dec 4, 2023 · 0 comments
Open
6 of 7 tasks

Comments

@MrBurmark
Copy link
Member

MrBurmark commented Dec 4, 2023

There are some kernels where tunings with the same name are not implemented the same way between Base and RAJA variants. Work on the implementations to make them the same or add tunings to have an apples to apples comparison between Base and RAJA variants.

Affected kernels/algorithms:

Other things affecting performance:

  • HALOEXCHANGE_FUSED - RAJA variants have dynamic scratch memory usage, lower hipLimitStackSize or set env HSA_SCRATCH_SINGLE_LIMIT=240000000 (MI250X) to avoid dynamic scratch memory allocation
  • Reducers - RAJA variants don't always inline, use compiler flags from hipcc (-mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false) or increase inline threshold (-fgpu-inline-threshold=100000)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant