Optional offloading to AMD GPUs #626

iotamudelta · 2022-04-21T23:31:53Z

This initial work via opt-in configure option enables offloading of some sgemm, dgemm, cgemm, zgemm operations to AMD GPUs via AMD's rocBLAS. It hence requires a working ROCm software stack and ROCm-enabled accelerator.

After enabling offloading capability, the default is "never offload". Offloading can be controlled through three environment variables:
BLIS_OFFLOAD=[never,always,thresh] - thresh enables threshold-dependent offloading
BLIS_OFFLOAD_SGEMM_THRESH=$number1 the threshold of MN size of sgemm after which offloading should be attempted - must be specified
BLIS_OFFLOAD_DGEMM_THRESH=$number2 the threshold of MN size of dgemm after which offloading should be attempted - must be specified
BLIS_OFFLOAD_CGEMM_THRESH=$number3 the threshold of MN size of cgemm after which offloading should be attempted - must be specified
BLIS_OFFLOAD_ZGEMM_THRESH=$number4 the threshold of MN size of zgemm after which offloading should be attempted - must be specified

Currently known limitations:

offloading decision is made purely based on M*N size of gemm in conjunction w/ user-controlled thresholds (or always/never offload)
rocBLAS is initialized w/ default settings - it'll hence use the first enumerated accelerator in a system and default stream

Future work:

offloading of integer gemms can be supported
better offloading decision engine and performance model with less user input required

jeffhammond · 2022-04-29T20:17:20Z

Why not make it a draft commit if you don't want it merged?

iotamudelta · 2022-04-29T20:32:18Z

@jeffhammond should be ready for merge soon after WIP items done - and I'm happy to get any functional reviews already.

Inspect A, B, C buffer pointers to see if they are already on device. If so, do not allocate buffers and copy.

iotamudelta added 4 commits April 21, 2022 18:17

First version of the offloading.

6dd6e40

Merge remote-tracking branch 'origin/master' into jmd/offload

07d448f

Chase const-ification.

34320be

Clean up and fix some things.

453ef8c

Fix some formatting.

85ec4a7

iotamudelta changed the title ~~[WIP] [DONTMERGE] Optional offloading to AMD GPUs~~ Optional offloading to AMD GPUs Apr 29, 2022

iotamudelta and others added 6 commits May 31, 2022 15:01

Merge branch 'master' into jmd/offload

502735e

Fix mismerge

afb5448

Fix debug.

f8d83f2

Merge branch 'flame:master' into jmd/offload

ff4db47

Inspect memory to evade memory allocation and copy.

2046e97

Inspect A, B, C buffer pointers to see if they are already on device. If so, do not allocate buffers and copy.

Add support for [s,d]complex

84804a3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional offloading to AMD GPUs #626

Optional offloading to AMD GPUs #626

iotamudelta commented Apr 21, 2022 •

edited

jeffhammond commented Apr 29, 2022

iotamudelta commented Apr 29, 2022 •

edited

Optional offloading to AMD GPUs #626

Are you sure you want to change the base?

Optional offloading to AMD GPUs #626

Conversation

iotamudelta commented Apr 21, 2022 • edited

jeffhammond commented Apr 29, 2022

iotamudelta commented Apr 29, 2022 • edited

iotamudelta commented Apr 21, 2022 •

edited

iotamudelta commented Apr 29, 2022 •

edited