Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitLab CI testing expansion #1644

Open
wants to merge 34 commits into
base: develop
Choose a base branch
from
Open

GitLab CI testing expansion #1644

wants to merge 34 commits into from

Conversation

long58
Copy link

@long58 long58 commented May 13, 2024

Summary

  • This PR is a feature
  • It does the following:
    • Adds SYCL CI checks on corona using the custom intel compiler
    • Adds instructions for building and installing the custom intel compiler
    • Adds OpenMP Target offload CI checks on lassen using ibm clang
    • Fixes GitLab cross project testing for RAJAPerf on changes to RAJA develop branch.

In order to accomplish this, I needed to make changes to the radiuss_spack_configs github project to add new compilers, and modify the raja and camp package.py files. I plan to make another pull request there integrating those changes, but for now, my branch of radiuss spack configs (used here) is up to date with the latest version of radiuss_spack_configs/develop.

Also, some (~9) of the OpenMPTarget tests on lassen are not consistent in whether they pass or fail. They will pass sometimes, and fail other times, and I'm really not entirely sure why.

This feature completes all objectives of Github issue #1611

@adrienbernede
Copy link
Member

adrienbernede commented May 15, 2024

@long58 I started reading your PR.

First, thank you for fixing the multi-project! I was actually looking at that today, and couldn’t see what was wrong. I think GitLab changed the syntax at some point to remove those quotes. So, thanks for the help.
Now, I see in RAJAPerf build_and_test.sh script that the multi-project will import RAJA@develop whatever the branch actually passed as "MP_BRANCH". This should be fixed to support any reference used (let’s start with the branch, but I think supporting commit hash would be useful too). I can work on that, but in this PR, we should at least use the branch passed with "MP_BRANCH".

We can see that the status reports a failure named "skipped-draft-pr". I think this will go away if you merge "develop" in you branch.

I’ll complete the review tomorrow.

@long58 long58 self-assigned this May 15, 2024
@rhornung67
Copy link
Member

@long58 good work on this so far. I'll review in more detail tomorrow. For now, I suggest that we remove the OpenMP target testing from this branch. Then, we can work on the OpenMP target stuff in a separate PR. It will take some effort to try various compilers to figure out what works. We've always had difficulty with compiler support for that back-end.

@adrienbernede I think it would be best to test RAJAPerf against the PR branch to be merged. However, it would be sufficient I think to see whether RAJA Perf passes or fails (so we can fix breakages in RAJA Perf) without requiring that check to pass before merging the RAJA branch. Is that what you have in mind?

@adrienbernede
Copy link
Member

@rhornung67 the new data point I find in your comment is that you don’t want to bind the RAJA PR status to the RAJAPerf status. I am find with that, and it can be achieve removing the strategy: depend in the trigger job (see .gitlab-ci.yml in RAJA.)

@adrienbernede
Copy link
Member

@rhornung67 The intel 19.1.2 + gcc 10.3.1 job is timing out on poodle.
Poodle interactive jobs have a 30 min hard limit.
If the intel job is gonna be longer than 30 minutes on a regular basis, then we should consider options:

  • deactivate that job on poodle
  • shorten that job somehow (if that were possible, I guess you’d have done so already)
  • use batch allocations
    Using batch allocations does not suits CI very well IMHO. I would advocate against it.

Given that poodle is quite small and allocations limited, I would argue that our CI on it should be more targeted. Currently any ruby job is also run on poodle, is that really necessary ?

@rhornung67
Copy link
Member

@adrienbernede I think we can remove that job. No users are exercising that compiler anymore, as far as I know. I suggest adding a job for intel/2023.2.1, which is the latest oneapi compiler.

@@ -31,7 +31,7 @@ variables:
# Project specific variants for poodle
PROJECT_POODLE_VARIANTS: "~shared +openmp +vectorization +tests"
# Project specific deps for poodle
PROJECT_POODLE_DEPS: ""
PROJECT_POODLE_DEPS: "^blt@develop "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@long58 @rhornung67 let’s not deal with this in this PR, but blt@develop is heavily used in the CI. We should check whether that is still necessary, it’s not good to point at a moving target in CI.

rhornung67 and others added 2 commits May 30, 2024 07:34
- Uncommented test cases for omptarget
- Deactivated lassen omptarget testing
- made removed strategy: depend from rajaperf multiproject testing so that the status of rajaperf will not influence raja's testing status
@long58
Copy link
Author

long58 commented Jun 4, 2024

@rhornung67 the new data point I find in your comment is that you don’t want to bind the RAJA PR status to the RAJAPerf status. I am find with that, and it can be achieve removing the strategy: depend in the trigger job (see .gitlab-ci.yml in RAJA.)

Fixed this in my most recent commit.

@long58
Copy link
Author

long58 commented Jun 4, 2024

@long58 good work on this so far. I'll review in more detail tomorrow. For now, I suggest that we remove the OpenMP target testing from this branch. Then, we can work on the OpenMP target stuff in a separate PR. It will take some effort to try various compilers to figure out what works. We've always had difficulty with compiler support for that back-end.

@adrienbernede I think it would be best to test RAJAPerf against the PR branch to be merged. However, it would be sufficient I think to see whether RAJA Perf passes or fails (so we can fix breakages in RAJA Perf) without requiring that check to pass before merging the RAJA branch. Is that what you have in mind?

I went ahead and deactivated the OpenMP Target job, so that we can spend more time on it in a separate PR, as you suggested.

@rhornung67
Copy link
Member

Yes, let's tackle OpenMP target testing in a separate PR. @long58 I owe you some guidance on compiler choice. I hope to get back to you on this sometime this week.

@adrienbernede
Copy link
Member

@long58 while trying to fix the conflicts in radiuss-spack-configs for you, I stumbled on something:
LLNL/radiuss-spack-configs#103 (review)

@adrienbernede
Copy link
Member

adrienbernede commented Jun 10, 2024

@long58 apparently Spack does not like your new compiler. I’m trying to find why...

@rhornung67
Copy link
Member

@long58 apparently Spack does not like your new compiler. I’m trying to find why...

The Spack error message is pretty cryptic:

==> Error: concretization failed for the following reasons:

  1. Cannot set the required compiler: raja%clang@19.0.0.sycl.gcc.10.3.1

I double-checked that service user rajasa is in the raja-dev group which is required to access the compiler. Could there be another permission issue someplace?

@long58
Copy link
Author

long58 commented Jun 10, 2024

@adrienbernede It looks like the update from spack version 02-18 to 05-26 broke the concretization of the new compiler. Any thoughts as to why this might be the case? Everything works when I change the spack version back to the old one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants