Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revive rhel-9.3 branch #19734

Merged
merged 30 commits into from
Dec 12, 2023
Merged

Conversation

martinpitt
Copy link
Member

@martinpitt martinpitt commented Dec 12, 2023

We have some more z-stream updates to do. Resuscitate this branch first to bring CI back to green. These are only clean backported test fixes, code changes will happen after this. This needs an updated naughty pattern from cockpit-project/bots#5667 to "fix" the NBDE test

martinpitt and others added 6 commits December 12, 2023 04:28
This is quite literally what it is defined to do. This races with the
CDP driver finishing the command, so sometimes it would fail the test on
throwing that RuntimeError.

Cherry-picked from a324b3f
Just janking out the disk will leave debris behind in /dev.

Cherry-picked from 5e1bafc
Stop predenting that we can accurately predict the next `OnBoot` timer
in TestServices.testTimerSession. It is very much *not* "now + 200
minutes", but "200 minutes after the current VM booted" (which may be
long-running in Testing Farm or our CI machinery). As this is a
neverending race condition in evenings, and we don't test the accuracy
of systemd here, relax the check to just ensure that it happens today or
tomorrow.

Cherry-picked from 2f149fd
It is not clear what exactly keeps /dev/sda1 busy when the kernel
tries to read the new partition table. It can't be the artificial
processes and services started by the test itself since unmounting and
locking have already succeeded at that point.

This bug happens only in quite specific conditions, and can't be
expected to ever get fixed. So let's do what every user would do as
well: Retry the dialog.

Cherry-picked from 530c70d
@martinpitt
Copy link
Member Author

martinpitt commented Dec 12, 2023

@martinpitt martinpitt added the backport apply a commit from master to a stable branch label Dec 12, 2023
martinpitt and others added 20 commits December 12, 2023 10:13
…tion

Since commit 49ee017, the step that takes long is already the
`Browser.open()`, as that loads the packages and frame from the remote
machine, and the timeout now happens in `waitPageLoad()`. Still, loading
the frame also takes a while, so keep the long timeout for enter-page()
as well.

Cherry-picked from b28914f.
In our CI the testLogs sometimes does not get a `START` and then fails.
From the failed log it looks like this is due to journal being rotated.
The user test case already sleeps to let journalctl settle so now we
unify this approach.

Nov 08 13:13:09 ubuntu systemd[1]: Started test.service - Test Service.
Nov 08 13:13:09 ubuntu test-service[2427]: START
Nov 08 13:13:09 ubuntu systemd-journald[271]: Received client request to rotate journal.

Nov 08 13:19:51 ubuntu systemd-journald[271]: Received client request to rotate journal.
Nov 08 13:19:52 ubuntu test-service[2480]: START
Nov 08 13:19:57 ubuntu test-service[2480]: WORKING

Cherry-picked from 107c855
Close the crypto policy dialog after checking the default value. Leaving
it open and clicking around on the main page is cheating and prone to
race conditions, and will fail with the next commit.

Cherry-picked from 8a93308
Tests like TestStorageUsed.testTeardownRetry run processes that keep a
scsi_debug block device mount busy. If they fail on some assertion in
the middle, the generic storage cleanup (umount, rmmod scsi_debug)
fails, and the following tests get broken. Add an `fuser` kill loop to
prevent that.

Also show all stdout output from these commands. We don't need it
returned in the code, it's more useful for developers in the test
output.

Cherry-picked from a45210a
This test barely makes it within the default 10 minutes timeout.
From what I see most of the time is spent by waiting for multiple
reboots of the machine.
Locally this took almost 7 minutes to run so for CI we can bump this
timeout to 20 minutes.

Cherry-picked from d2ccfc0
With the impending services image refresh [1] and the new Samba
container, user creation is not instantaneous any more. Add a retry
loop.

[1] cockpit-project/bots#4885

Cherry-picked from 7f12811
ldapmodify is not available in the quay.io/samba.org/samba-ad-server
container, and it has serious trouble to authenticate.

But the newer Samba now supports `samba-tool user edit`. Use that with a
on-interactive edit script instead.

Cherry-picked from b88436b
Adjust the data host CSS selector.

The new services image auto-enables the PCP plugin, so that hack can go.

Unfortunately the new version now tries to download the plugin catalog
in the background, and there is no working way to disable that. This
breaks the test at a random place. Anticipate, wait for, and ignore that
error.

Cherry-picked from 7c04205
Use the officially recommended /status route, which we expect to
actually succeed (unlike /candlepin, which is just a redirect).
Add curl `--fail` to ensure a non-zero exit code while it fails.

Cherry-picked from b2f0b4f
Apparently recent Samba/AD is a bit slower now.

Cherry-picked from ff0c229
First wait for the realm user to exist before using it in chown. D'oh!

Cherry-picked from fdca31b
In most cases this is fast, but quite often Samba takes annoyingly long
to answer. Make the timeout consistent and enforce this with helper
functions, except for the instance in TestPackageInstall as that doesn't
derive from CommonTests.

Cherry-picked from 9da9229
Restarting sssd in a loop is prone to run into

> systemd[1]: sssd.service: Start request repeated too quickly.
> systemd[1]: sssd.service: Failed with result 'start-limit-hit'.

Cherry-picked from 68d2eb7
With 30 seconds we are running into occasional timeout failures.

Cherry-picked from 6ef43c6
Restarting sssd that often causes state corruption, as it often cannot
initialize in 5s. It's also too much fiddling with the OS -- joining a
domain should make the users available automatically, otherwise this is
a bug.

This works fine with IPA, and doesn't regess AD either.
testUnqualifiedUsers() already does it that way, too.

Cherry-picked from c055b47
The current service image's samba container does not look at that any
more, and we also stopped using `ldapmodify`.

Cherry-picked from 4727d48
Password authentication sometimes fails on the first try.

Cherry-picked from a61bb41
martinpitt and others added 4 commits December 12, 2023 10:13
Grab the candlepin server's CA and install it both into rhsm and the
general system (for `curl`). This tests subscription-manager more
realistically, without having to yell "insecure" all the time.

Also simplify and robustify the waiting loop. Previously, the loop could just
end with 200 failures, and the test would go on. Now it will timeout. Also
lower the 6 minute timeout to the default 2 minute -- starting up candlepin
only takes a few seconds on our current image.

Cherry-picked from 564717f
…-project#19667)

Later Grafana versions [1] fixed the page crash on "Failed to fetch plugins
from catalog", and just log it to the console now. That will make the
"wait for false" loop timeout and eventually fail. If that happens, then
all is actually well.

[1] cockpit-project/bots#5601

Cherry-picked from de7ab98
With the latest service refresh [1] Grafana now handles being offline
correctly.

[1] cockpit-project/bots#5601

Cherry-picked from e8e4bda
When e.g. TestStorageswap.test fails in the middle, the active swap
partition on the scsi_debug driver will prevent the module removal, and
break all subsequent tests.

Helps with cockpit-project#19683

Cherry-picked from 6c3986d
Copy link
Member

@jelly jelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, assuming storage goes ✔️

@martinpitt martinpitt merged commit b0df0c5 into cockpit-project:rhel-9.3 Dec 12, 2023
22 checks passed
@martinpitt martinpitt deleted the r93-revive branch December 12, 2023 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport apply a commit from master to a stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants