Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU usage increases over time and eventually causes audio to stutter #360

Open
leviport opened this issue Sep 14, 2022 · 28 comments
Open

CPU usage increases over time and eventually causes audio to stutter #360

leviport opened this issue Sep 14, 2022 · 28 comments

Comments

@leviport
Copy link
Member

leviport commented Sep 14, 2022

Distribution (run cat /etc/os-release):
Pop!_OS 22.04

Related Application and/or Package Version (run apt policy $PACKAGE NAME):
Presumably system76-power, version unknown

Issue/Bug Description:
2022-09-14_15-38
https://twitter.com/emptythevoid/status/1570150727300923393

Steps to reproduce (if you know):
Not entirely clear, but presumably the machine just has to be left running with music playing/looping in the background for a while? The time frame to reproduce the bug isn't clear to me. It looks like it happens pretty quickly in that video, but prior uptime is unknown.

Expected behavior:
system76-power should not hog CPU, and audio playback should not stutter

Other Notes:
I assume this is on a kudu6, judging by the core count in powertop in the video.

@emptythevoid
Copy link

Hey, this is Greg. Thank you for creating the issue in github. I'll clarify what I can.

Steps to reproduce:
Leave PopOS running for some time. I've experienced this problem becoming noticeable in different timeframes. After a restart, it usually takes at least a full day to become noticeable, although I've also experienced it becoming a problem only after two days (although rare). I'm running in nvidia-only mode when I notice this, but I don't know if it happens when it's running integrated-only.

This is, indeed, a Kudu 6. This issue has been present since I performed a fresh install of PopOS 22.04 (I originally installed Ubuntu Mate as soon as my laptop arrived and did not experience this behavior. It was only after PopOS was installed). That's been about four months now.

I didn't make the connection to the system76-power until a couple days ago. Since it was audio-related, I've tried all manner of tweaks and adjustments to pipewire, but these didn't seem to have any noticeable impact. I've even disabled the system76-scheduler to see if that helped. I just happened to notice that the system76-power was relatively busy one night when the audio was degraded, so I thought I'd kill it and it instantly improved. This behavior also occurs when using a USB headset, regular wired headphones, or the speakers. I do lock my computer often throughout the day, and systemctl status system76-power shows a DBUS Received GetGraphics() method and GetProfile() method each time this happens, if that means anything useful.

Other notes: My laptop usually stays tethered to a Dell WD19S dock with two monitors, but the video I posted to twitter was done without the dock while running on battery. Still, the computer was on for the full day today, and the stuttering only started this evening. I haven't been observing system76-power long, but the highest I've ever seen it go was about 23% CPU according to htop. If there is something I can do to get more information about this process when it happens, I'll be very happy to test.

@emptythevoid
Copy link

Here's a screenshot of an strace on system76-power just after a reboot. It's using between 0% and 0.7% CPU currently.
Screenshot from 2022-09-14 20-33-08

@emptythevoid
Copy link

My computer has been left on overnight, locked. Just before I went to bed, system76-power was between 0.7% and 1.3%. This morning, it's bumped to between 2.0% and 4.6%. Systemd reports that's about 12 full hours.

@mmstick
Copy link
Member

mmstick commented Sep 15, 2022

I'm looking for a possible cause. Any difference in strace output after that day?

@emptythevoid
Copy link

emptythevoid commented Sep 15, 2022

I have no idea what I'm looking for, so I don't know what's different. Here's a screenshot of what it looks like today.
Screenshot from 2022-09-15 11-24-42

Guide me on how I can get you more information.

Edit: system76-power is currently between 2.6% and 5.9% as of this strace screenshot, Current uptime is about 15 hours. systemd reports 18 minutes of CPU time for system76-power.

@mmstick
Copy link
Member

mmstick commented Sep 15, 2022

Maybe you've noticed it outputting something more frequently than before

@mmstick
Copy link
Member

mmstick commented Sep 15, 2022

I think it's safe to assume that it's not the fan daemon or hotplug detection since these are only executed once per second. I don't think strace will be useful. May need to run this through a proper profiling tool.

@emptythevoid
Copy link

It looks like it's doing openat(AT_FDCWD for /sys/class/hwmon/hwmon 1-5, does some lseek and reads, a couple of clock_gettime, one write, an epoll_wait(3 , then a couple more clock_gettime. This seems to look the same as when I started it, but I'm not sure anything would stand out to me unless it looked significantly different. htop is polling the strace every second, and it looks like the all the stuff I listed above also happens every second.

@mmstick
Copy link
Member

mmstick commented Sep 15, 2022

Yeah it might be more helpful to get output from a flamegraph using a tool like this. It's most likely some application-level code.

@emptythevoid
Copy link

Is this of any use?
flamegraph

@mmstick
Copy link
Member

mmstick commented Sep 15, 2022

No, doesn't contain any function calls

@emptythevoid
Copy link

Here's one I ran for a couple seconds. Does this help, or still useless
flamegraph

@mmstick
Copy link
Member

mmstick commented Sep 15, 2022

No, no functions from system76-power itself

@emptythevoid
Copy link

I'm running the command as sudo ~/.cargo/bin/flamegraph --bin=system76-power daemon

Should I be doing something different?

@mmstick
Copy link
Member

mmstick commented Sep 15, 2022

cargo flamegraph --root daemon from the project directory with the system76-power daemon stopped in advance.

@emptythevoid
Copy link

emptythevoid commented Sep 15, 2022

Is this heading in the right direction? This was generated from in the .cargo/bin directory with the command:
flamegraph --root -- /usr/bin/system76-power daemon

flamegraph

@mmstick
Copy link
Member

mmstick commented Sep 15, 2022

No, it requires building system76-power from source code to have all the debugging symbols and function call information.

@emptythevoid
Copy link

Oooooh, okay. Do you want me to try to do that? If so, what compilation command and flags should I use?

@emptythevoid
Copy link

I'm really sorry. I'm still trying to figure out what I'm doing.

See if this is any better. I edited the makefile to enable debugging (I think) and compiled it. The binary went into the debug directory, so I'm assuming that got it.

I navigated to the debug directory and ran:
flamegraph --root -- ./system76-power daemon

And got this graph
flamegraph

@mmstick
Copy link
Member

mmstick commented Sep 15, 2022

This is good. Let it run for several hours and compare the results tomorrow to see if something stands out more as time goes on.

@emptythevoid
Copy link

You got it. Thank you. So sorry for the confusion

@emptythevoid
Copy link

Left my computer running over night. system76-power was using about 5.6% at the time I stopped it. flamegraph collected about 17GB of perf data and took about an hour-and-a-half to generate the graph. Here it is.
flamegraph

@mmstick
Copy link
Member

mmstick commented Sep 16, 2022

Seems that it's entirely hotplug detection. Unusually high CPU demand from that codepath. Will investigate optimizations.

@emptythevoid
Copy link

Okay, so let me add to that point. I'm using a WD19S USB-C dock with two monitors (one is connected by USB-C to the dock, the other is by display port). Now, I don't notice any issue with this set up, and the systemd status for system76-power doesn't show a hotplug event unless I unplug the dock and plug it back in. However, the very few times I use Discord, it will occasionally and randomly mention that a new Nvidia HDMI audio source was detected, and I never use that for audio. This may be a red herring, but it's the only anomaly I can think of that might be related at all. Please let me know if there is anything else I can do to assist.

@mmstick
Copy link
Member

mmstick commented Sep 16, 2022

Every second the service checks for display hotplug events. I'll try to reduce the amount of reads that it's performing.

@mmstick
Copy link
Member

mmstick commented Sep 16, 2022

Try running the perf branch and see if that improves CPU usage.

@emptythevoid
Copy link

emptythevoid commented Sep 18, 2022

This was from system76-power (perf branch) running over night. By the time I stopped flamegraph, it collected about 45GB of data.
Edit: the system76-power process was using just about 6% CPU at the time I stopped it

flamegraph

@emptythevoid
Copy link

Any update on this? I'm still experiencing this on my Kudu. I have to manually stop system76-power after booting up, and then because that's not running, I have to manually change to integrated/nvidia. It's not terrible, but it's very annoying to have to keep doing this, and I'm really hoping for a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants