Maximum CPU time - spread tasks evenly over timeslot #4181

cminnoy · 2021-02-13T13:38:37Z

Describe the problem
When choosing a value of 'Maximum % of CPU time" BOINC will schedule the runtime in seconds.
For example, selecting 25% will make BOINC tasks run for one second, and sleep for 3 seconds.
In a multi core environment this is quite detrimental for performance, power spikes and more.
Also when all the tasks start at once, the computer often is less responsive for one second, and then 3 seconds is very usable.

Describe the solution you'd like
It would be beneficial for the system to spread the multi-core tasks in time, using slots.
For example, if we have an 8 core system with a 25% usage selected, we have have 8 tasks running:

First second: run 2 tasks out of the 8 (tasks 1, 2)
Second second: run next 2 tasks out of the 8 (task 3, 4)
Third second: run other 2 tasks (task 5, 6)
Fourth second: run other 2 tasks (tasks 7,8)

Second example:
In a 100 core system with a 1% CPU load selected:
In the old situation all 100 cores would run for 1 second, completely bogging down the system.
For the next 99 seconds there would be no BOINC tasks running.
In the new situation, every task would run for 1 second in consecutive order.

This would provide a better user experience:

Fans will not throttle up and down that aggressively reducing noise
Power supplies have a less hard time
More gentle thermal effect on CPU (less expansion/contraction cycles
Computer is more usable in general
Simply makes much more sense

Additional context
In case you want your computer to consume a steady amount of power this is the only way.

Note: It would be beneficial that CPU % LOAD is separately configurable to GPU % LOAD. Same concept, but time slots also take into account amount of GPUs installed in the system.

cminnoy · 2021-02-15T14:50:11Z

See function throttler in file client/app.cpp

cnergyone · 2021-02-16T19:17:36Z

https://github.com/cnergyone/boinc

AenBleidd · 2021-02-16T21:26:12Z

@cnergyone, if you have smth - please send a PR

truboxl · 2021-02-20T10:16:33Z

How's it different than setting number of CPU used with 100% CPU time?
The switch between tasks every X minute seems to cover this...

cnergyone · 2021-02-20T16:01:29Z

Sorry, but I don't understand your sentence 'switch between tasks every x minute'?
This has nothing to do with switching between tasks/projects.

A. Max cpu time is a value that determines how much time (%) BOINC gets over a time span.
B. Max cpu cores is a value that determines how many cores all running tasks may occupy together.
C. Switching between tasks/projects happens every 60 minutes (but you can set it differently).

Parameter B is very ridged. You don't want to have this value changed dynamically or even very often.
I have seen VM tasks crash if you change the parameter while those tasks are running. So set it once and leave it.

Parameter C is a about spreading compute time within and between projects. It will have zero impact on the performance of BOINC nor the CPU usage. Its only intend is to make sure that long duration tasks don't get advantage compared to short running tasks.

Parameter A can be changed whenever you want, and will not lead to a crash of any task, also not VM tasks. It is intended to be more dynamically set, and can be used with temperature control or power control. I will add this feature later to boinccmd so you can set the cpu time used dynamically; You can then control BOINC cpu usage easily from a script.

The difference between the old throttler code and the proposed one:
Old code: Put a frog in a pot with water, let it boil for 1 second, then let the frog freeze at 0 C for 4 seconds.
New code: Put the frog in a pot at 20 degrees C for 5 seconds.

The average temperature is the same, but the user experience for the frog is quite different.

truboxl · 2021-02-22T12:21:44Z

I still don't understand why do you need to

run 2 tasks and suspend 6 tasks at that one second timeframe
and then consecutively suspend / resume the other tasks at the other second

when you can just effectively set 25% CPU (8 * 0.25 = 2) with 100% CPU time used and achieve the same effect

this is just tasks hoarding

I also think there's an overhead doing suspend / resume every now and then, especially the tasks are not set to stay in memory

edit: if the suggestion is to improve the cpu time used to be consistent, then fine I suppose, but dismissing the core count settings as useless are not. The crash in VM should be looked into and fixed properly.

cnergyone · 2021-02-22T14:27:39Z

The preempt is with REMOVE_NEVER, so the tasks should remain in memory. That's also what I see in top.
The overhead is really small (milliseconds on most systems). The difference in overhead between the new throttler and the existing throttler is close to zero. The difference in scheduling overhead between the two algos towards the tasks is ZERO.
If you have proof of extra overhead, please post it here.
A task runs for many seconds (for example 50 seconds if you set it to 50%). Tasks are shifted only in time towards each other, even in the end improving memory throughput as they don't starve the cache that much.
If you have proof that dynamically scheduling max_ncpus_pct is exactly the same, again please post it here.

davidpanderson · 2021-02-23T02:54:23Z

The purpose of CPU throttling is to let you reduce average CPU temperature,
and to divide the heat evenly among the cores, i.e. to minimize the max temp of the cores.
If you have 4 cores, and you run 4 jobs, the OS will run them on different cores.
If you stagger the jobs so that only 2 run at a time, the OS will (possibly)
run everything on the same 2 cores; they'll get hot and the others will stay cold.
The current implementation is fine as far as I can tell.

cnergyone · 2021-02-23T09:20:33Z

"The purpose of CPU throttling is to let you reduce average CPU temperature,
and to divide the heat evenly among the cores, i.e. to minimize the max temp of the cores."

This is the job of the kernel. If you provide it less tasks than there are cores it happily shuffles tasks around on cores.
The kernel knows best where to put tasks, and will never stick a task to a core for longer than a few hundred of milliseconds,
unless it has no other place to put it.

"If you have 4 cores, and you run 4 jobs, the OS will run them on different cores.
If you stagger the jobs so that only 2 run at a time, the OS will (possibly)
run everything on the same 2 cores; they'll get hot and the others will stay cold.
The current implementation is fine as far as I can tell."

This will not happen on Linux nor Windows. You can use htop and see.

You can find info on the linux scheduler here: https://www.kernel.org/doc/html/latest/scheduler/index.html

But nothing beats a scientific approach, so lets measure:

8 core linux system, 16 logical cpus, 16 CPU tasks running + 1 GPU task, each run takes 10 minutes
Projects active: Universe and World Community Grid

Original algorithm CPU time set to 20%:
Min temp: 62.3C
Max temp: 76.6C
Avg temp: 65.9C

Original algorithm CPU time set to 69%:
Min temp: 63.8C
Max temp: 76.0C
Avg temp: 65.8C

New algorithm CPU time set to 20%:
Min temp: 50.8C
Max temp: 69.8C
Avg temp: 56.7C

New algorithm CPU time set to 69%:
Min temp: 58.8C
Max temp: 64.1C
Avg temp: 62.1C

Conclusion: the average temperature of the cpu with the old algorithm remains the same between low and medium setting. There are temperature spikes that go up to 76C. The average temperature in the new algorithm is lower than the old. There is also a clearer distinction between low and medium setting in the average temperatures. The spikes go less high leading to a lower maximum temperature of the CPU.

Interesting observation:
The new algo isn't perfect either, as there is a wobbling in the CPU temperatures.
I will investigate this further in the coming weeks and see where it can still be improved.

This was referenced Feb 15, 2021

limit GPU usage to % value #1451

Open

differentiate between number of CPUs to use in idle and busy mode #41

Closed

cnergyone linked a pull request Feb 19, 2021 that will close this issue

Maximum CPU time - spread tasks evenly over timeslot #4181 #4207

Open

CharlieFenton mentioned this issue Mar 30, 2021

On C++11 and up support #3635

Open

AenBleidd added C: Client - Scheduler Policy E: > 1 month P: Minor T: Enhancement labels Apr 25, 2021

AenBleidd added this to To do in BOINC Client/Manager via automation Apr 25, 2021

AenBleidd added this to the Client/Manager Future Release milestone Apr 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximum CPU time - spread tasks evenly over timeslot #4181

Maximum CPU time - spread tasks evenly over timeslot #4181

cminnoy commented Feb 13, 2021 •

edited

cminnoy commented Feb 15, 2021

cnergyone commented Feb 16, 2021

AenBleidd commented Feb 16, 2021

truboxl commented Feb 20, 2021

cnergyone commented Feb 20, 2021

truboxl commented Feb 22, 2021 •

edited

cnergyone commented Feb 22, 2021 •

edited

davidpanderson commented Feb 23, 2021

cnergyone commented Feb 23, 2021

Maximum CPU time - spread tasks evenly over timeslot #4181

Maximum CPU time - spread tasks evenly over timeslot #4181

Comments

cminnoy commented Feb 13, 2021 • edited

cminnoy commented Feb 15, 2021

cnergyone commented Feb 16, 2021

AenBleidd commented Feb 16, 2021

truboxl commented Feb 20, 2021

cnergyone commented Feb 20, 2021

truboxl commented Feb 22, 2021 • edited

cnergyone commented Feb 22, 2021 • edited

davidpanderson commented Feb 23, 2021

cnergyone commented Feb 23, 2021

cminnoy commented Feb 13, 2021 •

edited

truboxl commented Feb 22, 2021 •

edited

cnergyone commented Feb 22, 2021 •

edited