Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tune PermutationForDeposition for MI250X #3925

Conversation

AlexanderSinn
Copy link
Member

@AlexanderSinn AlexanderSinn commented May 7, 2024

Summary

PermutationForDeposition was initially developed for A100. A few tweaks can be made to improve performance on MI250X, which has a smaller cache but is much less sensitive to atomic add congestion.

Additional background

Test with MI250X
image

I also did the same test with A100, where I forced it to use the AMD tune.
image

Checklist

The proposed changes:

  • fix a bug or incorrect behavior in AMReX
  • add new capabilities to AMReX
  • changes answers in the test suite to more than roundoff level
  • are likely to significantly affect the results of downstream AMReX users
  • include documentation in the code and/or rst files, if appropriate

@ax3l ax3l requested review from ax3l, atmyers and WeiqunZhang May 21, 2024 16:21
@atmyers atmyers merged commit fed4bc1 into AMReX-Codes:development May 22, 2024
68 of 69 checks passed
atmyers pushed a commit that referenced this pull request May 22, 2024
## Summary

Fix typo from #3925

## Additional background

## Checklist

The proposed changes:
- [x] fix a bug or incorrect behavior in AMReX
- [ ] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX
users
- [ ] include documentation in the code and/or rst files, if appropriate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants