Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add services boot overview #19455

Closed
wants to merge 38 commits into from
Closed

Conversation

QazCetelic
Copy link

@QazCetelic QazCetelic commented Oct 7, 2023

Page that shows the time spent on each service during the boot process using systemd-analyze. It can be used to debug a slow boot process.

normal
The service detail page is shown when a service is clicked in the chart.

Loading state

loading_state

Failure state

This should never occur, but has been added just in case.
failure_state

@QazCetelic QazCetelic marked this pull request as draft October 7, 2023 18:36
@QazCetelic QazCetelic marked this pull request as ready for review October 20, 2023 10:13
@QazCetelic QazCetelic changed the title [DRAFT] Add services boot overview Add services boot overview Oct 20, 2023
@QazCetelic
Copy link
Author

👋 I'd really appreciate a code review on my PR whenever you get a chance. If you're busy, just a quick ETA would be awesome. Thanks a bunch!

@QazCetelic
Copy link
Author

Hi, any update on the review? @garrett @martinpitt

Copy link
Member

@martinpitt martinpitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @QazCetelic , this is a cool feature! Sorry for taking so long to review, holidays and all. I have a few initial comments. Most importantly, this feature needs to add some integration test in test/verify/check-system-services. This should cover at least system/session mode, validate that there is something sensible printed in the card/legend, and contains at least one well-known running service (or grab one from calling systemd-analyze and make sure it appears). I'm happy to assist with that, of course. But do you want to give it an initial shot? Please see https://github.com/cockpit-project/cockpit/blob/main/test/README.md

For the next iteration, can you please git fetch origin; git rebase origin/main instead of merge commits? Updating this branch should fix the RPM build failures.

pkg/systemd/bootInfo.jsx Outdated Show resolved Hide resolved
pkg/systemd/bootInfo.jsx Outdated Show resolved Hide resolved
pkg/systemd/bootInfo.jsx Outdated Show resolved Hide resolved
pkg/systemd/bootInfo.jsx Outdated Show resolved Hide resolved
pkg/systemd/bootInfo.jsx Outdated Show resolved Hide resolved
const [plot, legend] = doc.querySelectorAll("g");
legend.remove();
[...plot.querySelectorAll("text.left"), ...plot.querySelectorAll("text.right")].forEach((text) => {
const match = text.innerHTML.match(/^(?<service>.+\.[a-z._-]+)(\s+\((?<time>\d+(\.\d+)?)\w+\))?$/);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my version of systemd-analyze (Fedora 39) the output SVG doesn't contain <service> tags. The services look like

  <text class="left" x="264.191" y="2674.000">initrd-parse-etc.service (52ms)</text>
  <rect class="activating" x="264.614" y="2680.000" width="0.000" height="19.000" />
  <rect class="active" x="264.614" y="2680.000" width="84.109" height="19.000" />

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code doesn't look for a <service> tag. Are you perhaps confusing the named capture group with an HTML tag?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I did. But this looks even more brittle then -- processing XML with line-based REs sounds very brittle.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't process any XML.

I'll use your example from above to explain

Here's a text element that is processed.

<text class="left" x="264.191" y="2674.000">initrd-parse-etc.service (52ms)</text>

The code gets the inner HTML, which is initrd-parse-etc.service (52ms) (innerText would return undefined). Then parses that using the regex.

The 3 named regex capturing groups then contain the following values:

Group Value
service initrd-parse-etc.service
time 52
time_unit ms

Note: I changed the regex to split time and the unit with the latest commit

It won't matter if other text elements are added by systemd-analyze later with the same class because it won't do anything unless the format matches.

pkg/systemd/bootInfo.jsx Outdated Show resolved Hide resolved
pkg/systemd/bootInfo.jsx Outdated Show resolved Hide resolved
pkg/systemd/bootInfo.jsx Show resolved Hide resolved
Comment on lines 925 to 927
{ activeTab == "boot"
? <BootInfo />
: <ServicesPageBody
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels bolted on -- all other tabs are unit types, this one is completely different. I could live with it, though.

Perhaps we should instead have a separate "Boot chart" button? @garrett any opinion/idea?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you mean, but I wouldn't know where else to put it. @garrett do you have a suggestion?

GitHub Workflow and others added 16 commits January 8, 2024 20:49
libssh 0.10.6 made host name parsing stricter. `some_host` is not a
valid general host name, and is rejected with the latest version.
Bumps the esbuild group with 2 updates: [esbuild](https://github.com/evanw/esbuild) and [esbuild-wasm](https://github.com/evanw/esbuild).

Updates `esbuild` from 0.19.10 to 0.19.11
- [Release notes](https://github.com/evanw/esbuild/releases)
- [Changelog](https://github.com/evanw/esbuild/blob/main/CHANGELOG.md)
- [Commits](evanw/esbuild@v0.19.10...v0.19.11)

Updates `esbuild-wasm` from 0.19.10 to 0.19.11
- [Release notes](https://github.com/evanw/esbuild/releases)
- [Changelog](https://github.com/evanw/esbuild/blob/main/CHANGELOG.md)
- [Commits](evanw/esbuild@v0.19.10...v0.19.11)

---
updated-dependencies:
- dependency-name: esbuild
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: esbuild
- dependency-name: esbuild-wasm
  dependency-type: direct:development
  update-type: version-update:semver-patch
  dependency-group: esbuild
...

Signed-off-by: dependabot[bot] <support@github.com>
The most recent RHEL 9.4 cloud images added the "rhc" package. This
currently contains a hack to locally modify the SELinux policy [1], so
we need to ignore this.

At least we can drop the insights_client hack, it was fixed properly in
https://bugzilla.redhat.com/show_bug.cgi?id=2226684

[1] https://issues.redhat.com/browse/RHEL-20352
This unbreaks the Shell's SSH support with the latest round of OpenSSH
0.9.6 security updates.
… 255 output format

Commit 800c89f fixed this for check-static-login already. Debian
testing now got systemd 255 and ran into this changed output format as
well.
There is a race between the kernel and lvconvert which might cause
lvconvert to hang when the kernel is done repairing before lvconvert
has resumed all device mapper devices.

Let's try to slow down the repair process by making the disks bigger.

See https://bugzilla.redhat.com/show_bug.cgi?id=2256433
Bumps [sass](https://github.com/sass/dart-sass) from 1.69.5 to 1.69.6.
- [Release notes](https://github.com/sass/dart-sass/releases)
- [Changelog](https://github.com/sass/dart-sass/blob/main/CHANGELOG.md)
- [Commits](sass/dart-sass@1.69.5...1.69.6)

---
updated-dependencies:
- dependency-name: sass
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
When no files exist in the directory the existing code would try to use
the * as a file:

kkoukiou@o2:~/k$ for f in *; do echo $f; done
*

Instead let's list the files with `ls` to avoid this corner case as this
breaks anaconda tests.
Bumps [sass](https://github.com/sass/dart-sass) from 1.69.6 to 1.69.7.
- [Release notes](https://github.com/sass/dart-sass/releases)
- [Changelog](https://github.com/sass/dart-sass/blob/main/CHANGELOG.md)
- [Commits](sass/dart-sass@1.69.6...1.69.7)

---
updated-dependencies:
- dependency-name: sass
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
martinpitt and others added 12 commits January 8, 2024 20:49
This didn't help after all.

This reverts commit 4b8d7e7.
It still fails way too often. Bring back the skip, and add the bug
reference.
On loading the page, the "Disallow interactive password" checkbox is
initially disabled, with a `null` value which shows up as [-]. Wait
until it is initialized before taking the pixel test.

Fixes cockpit-project#19787
Avoid returning a -1 exec ID in getFrameExecId(), as it is an
invalid/unknown context. This got triggered by the changes on the latest
Grafana's login page, which creates a short-lived frame with two
execution contexts which immediately go away again.

Also simplify the code a bit -- it's perfectly legit to look up an
`undefined` key in a dictionary and get `undefined` as result.
Our manual addScriptToEvaluateOnNewDocument() is asynchronous, so it can
happen that a `Browser.open(); Browser.wait_*()` command sequence tries
to call `ph_wait_cond()` which doesn't actually exist in the page JS
environment just yet. Catch this exception and retry the condition,
similar to the various handled `RuntimeError`s.
The latest Grafana's login page creates two short-lived "about:blank"
and "about:srcdoc" frames with empty `""` names, and two execution
contexts which immediately go away again. This confuses our frame ID →
frameIdToContextId so that they try to run our CDP commands in these
wrong frames. Ignore them.
…ilableUpdates

This test often fails because the "Reboot scheduled" message does not
reliably appear in dnf-automatic-install.service's journal. The
scheduled reboot still works, though.

Robustify this by replacing the journal scraping with a direct check for
a scheduled reboot. Newer systemd versions offer `shutdown --show` for
this, which exits with 1 and prints "No scheduled shutdown", or exits
with 0 and prints "Reboot scheduled for <date>...".

On RHEL8 we already have the `/run/nologin` check, which tests this
enough. (Also, RHEL 8 support will drop in less than a month, so not
important any more).

Drop the obsolete "Shutdown" absence check from the case where no
shutdown should be scheduled. Recent systemd says "Reboot" (so this was
a no-op), and checking for an absent /run/nologin is sufficient.

To avoid destroying the testbed, ensure that the test always cancels the
scheduled shutdown even if the check fails.

Fixes cockpit-project#19812
Bumps [date-fns](https://github.com/date-fns/date-fns) from 3.0.6 to 3.1.0.
- [Release notes](https://github.com/date-fns/date-fns/releases)
- [Changelog](https://github.com/date-fns/date-fns/blob/main/CHANGELOG.md)
- [Commits](date-fns/date-fns@v3.0.6...v3.1.0)

---
updated-dependencies:
- dependency-name: date-fns
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
In the btrfs PR this logic was copied over again and then refactored in
a function.
@QazCetelic
Copy link
Author

@martinpitt thanks for the response. I'm a bit busy myself right now, so I was only able to partially process the feedback. I will follow up another time.

@QazCetelic
Copy link
Author

Created new branch and pr at #20085

@QazCetelic QazCetelic closed this Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants