Skip to content

Bash scripts to automatically adjust concurrency of E@H gravitational wave tasks running on AMD GPUs

License

Notifications You must be signed in to change notification settings

csecht/gravwave-taskX-df

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gravwave-taskX-df

Scripts to automatically adjust concurrency of E@H gravitational wave tasks running on AMD GPUs. Can also be used to just monitor status of BOINC GW tasks.

These taskXDF bash shell scripts are for Linux systems with one or more AMD GPUs running Einstein@Home gravitational wave GPU tasks (O3AS) through BOINC. Supported AMD GPUs are Ellesmere, Polaris, Vega, or Navi. nVIDIA cards are not currently supported. The program will work in mixed GPU task queues with Gamma Ray (FGRPB1G) tasks. Other task apps have not been tested.

The package consists of the files, taskXDF, taskXDF-timer, and taskXDF.cfg. All files should be run from within the same directory.

The package monitors GPU memory usages and delta frequency (DF) values of current and pending tasks, then evaluates whether to increase, decrease, or maintain the task multiple for running concurrent tasks (task X). Changes to task X automatically are made by the script by changing the value of <gpu_usage> in the app_config.xml file.

Script actions and metrics are reported to the terminal window and to a log file that the script generates, taskXDF.log. Default execution allows logging, but there is a command line option for no logging.

A file that tracks processed task names, currently_running_GWtasks.txt, is generated by the script in the script's parent directory; it should not be moved or deleted while the program is running (although nothing major will happen if you do.)

Script execution on timed intervals is controlled through the timer script, taskXDF-timer, which is run from the command line. A 1-and-done status report is available with the --status parameter entered as "taskXDF-timer --status" or "taskXDF --status". Changes to taskX can be avoided while still running timed intervals by setting "monitor_only=yes" in the taskXDF.cfg configuration file.

Additional user-specified run parameters are set in taskXDF.cfg. This configuration file can be edited with any text editor and can be edited on-the-fly while the timer is running, which may be useful to optimize settings as GW tasks run. Read the file comments to understand the settings.

The timer script is executed with a time parameter argument in seconds; e.g., ~$ ./taskXD-timer 60. Usage details can be read with ~$ ./taskXD-timer --help, which brings up:

$ ./taskXDF-timer --help

    A time interval, in seconds, for running the taskXDF-timer script is required
        to automate changes to task X.
    e.g., ./taskXDF-timer 60
    Values less than 60 will be converted to 60. Decimal seconds are invalid.

    Optional arguments: --header,  --status, or --help
        --header, provides explanation of types of interval data reported, then exits.
        --status, provides current BOINC and GPU metrics, then exits; does not change task X.
        --help, brings up this message, then exits.

    This timer script provides continuous timed runs of the taskXDF bash script.

    A time interval of 0 (zero) provides a one-off status report from taskXDF,
        otherwise a non-zero interval time provides interval reports and taskX adjustments.
    A time interval of 60 will provide good script responsiveness.

    To monitor task status on a regular interval without changing task X, edit the
        taskXDF.cfg file to monitor_only=yes (default is 'no').

    Supported AMD GPUs: Ellesmere, Polaris, Vega, or Navi that are running
        Einstein@Home gravitational wave and gamma-ray GPU tasks.
    Task X is changed only for the GW app.
    The script may not work properly if CPU tasks are queued or running.

    A log file of Terminal output is created in the current folder.

    Script run parameters can be configured in the taskXDF.cfg text file.

    Version: 0.10.2

Additional details for use are provided in the comments of the scripts and .cfg text file. Please read through those before using to understand what's going on.

These scripts are written in bash ver.5.

Successful execution will display like this in the Terminal window:

$ ./taskXDF-timer 60
taskXDF-timer is executing taskXDF every 60 seconds; taskXDF reports every 20.0 minutes.
The reporting interval is set in the taskXDF.cfg file.
Running... (can be stopped with CTRL+C)
date   time     │ queued │ VRAM% │ GTT% │ taskGB │ X │ DFs: [running] [waiting] [ready]
————————————————│————————│———————│——————│————————│———│————————————————————————————————————————
Jul 19 12:54:36 │     47 │    81 │ 1.67 │   1.08 │ 3 │ [.20 .20 .20 .20 .20 .20 ] [na] [.20 .20 ]
Jul 19 13:14:56 │     46 │    81 │ 1.67 │   1.08 │ 3 │ [.20 .20 .20 .20 .10 .10 ] [na] [.20 .20 ]

NOTES and TIPS: When using taskXDF-timer to actively change the task X, you will want to avoid having to provide a password to edit <gpu_usage> in the app_config.xml file. So make yourself a group member and provide group write privileges:

$ sudo chown boinc:username /var/lib/boinc-client/projects/einstein.phys.uwm.edu/app_config.xml
$ sudo chmod 660 /var/lib/boinc-client/projects/einstein.phys.uwm.edu/app_config.xml

Check that the changes are in place:

$ ls -l /var/lib/boinc-client/projects/einstein.phys.uwm.edu/app_config.xml
-rw-rw-r-- 1 boinc yourusername 1523 Aug 23 08:02 /var/lib/boinc-client/projects/einstein.phys.uwm.edu/app_config.xml

You will need to manually launch the timer script on each machine reboot. To keep taskX adjustments responsive to changing DFs of GW tasks, use a 60 second time interval argument when launching the timer script.

Rarely, a GPU card running gravitation wave tasks will lose compute ability, thus delaying progress and, hours later, either time-out or exit with a computation error. By default, this script provides a warning when a task takes more than 40 minutes to complete. This time can be reset in the taskXDF.cfg file. Check that this limit is reasonable for your system. There an option in taskXDF.cfg to not suspend a too-long task; the default setting is auto_suspend=yes, which can be changed to auto_suspend=no.

About

Bash scripts to automatically adjust concurrency of E@H gravitational wave tasks running on AMD GPUs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages