Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trilinos nightly failure, Hip builds on MI210, hip.reduce_device_view_mdrange_policy fails in builds with MPI #6981

Open
ndellingwood opened this issue May 2, 2024 · 1 comment
Labels
Backend - HIP Failure - Nightly Nightly Build Failure Failure - Trilinos Continuous Integration Build Failure

Comments

@ndellingwood
Copy link
Contributor

Describe the bug

Nightly Trilinos Hip builds are failing in the hip.reduce_device_view_mdrange_policy unit test when the job is run with 4 mpi processes:

21:34:12 [ RUN      ] hip_managed.view_allocation_large_rank
21:34:12 Memory access fault by GPU node-2 (Agent handle: 0x5f27b20) on address 0x7fa871800000. Reason: Unknown.
21:34:12 --------------------------------------------------------------------------
21:34:12 Primary job  terminated normally, but 1 process returned
21:34:12 a non-zero exit code. Per user-direction, the job has been aborted.
21:34:12 --------------------------------------------------------------------------
21:34:12 --------------------------------------------------------------------------
21:34:12 mpirun noticed that process rank 0 with PID 3958738 on node lean1 exited on signal 6 (Aborted).
21:34:12 --------------------------------------------------------------------------

The job is set to rerun failed tests when failures occur and the test failed in both the initial run and rerun

@ndellingwood ndellingwood added Failure - Nightly Nightly Build Failure Failure - Trilinos Continuous Integration Build Failure Backend - HIP labels May 2, 2024
@ndellingwood
Copy link
Contributor Author

This test passed in the most recent run of nightlies, we should keep this issue open for tracking occurrences

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend - HIP Failure - Nightly Nightly Build Failure Failure - Trilinos Continuous Integration Build Failure
Projects
None yet
Development

No branches or pull requests

1 participant