Multiplication of tensors with rank > 2 #582

jegork · 2023-01-07T22:24:26Z

Hello!

Currently, * operator only supports matrix-matrix and matrix-vector multiplication. Are there any plans to add support for batch matrix-matrix multiplication? It would be really useful for stuff like Attention which I am trying to implement

Thanks!

lucidrains · 2023-03-31T14:34:32Z

i'm interested in this as well

lucidrains · 2023-04-08T16:50:01Z

i will be the first to submit a PR for GPT in Arraymancer, if all the pieces are available to build attention

jegork · 2023-05-10T12:43:39Z

@lucidrains I actually requested this as I started writing a PR to add attention in arraymancer myself 😅, however I am a big fan of your work and it would be great if you have to possibility to add attention

Regarding this issue, I might have time to implement it, however I have little knowledge about possible implementations

mratsim · 2023-05-12T07:38:39Z

Sorry, I was off for a couple months and didn't check my long list of Github notifications.

batch matrix multiplication is something I've wanted to implement like 4 years ago. My main issue is that some libraries provide it:

Intel MKL via https://www.intel.com/content/www/us/en/developer/articles/technical/introducing-batch-gemm-operations.html
Nvidia CuBLAS via https://developer.nvidia.com/blog/cublas-strided-batched-matrix-multiply/

but not OpenBLAS or ~~BLIS~~:

actually gemm_batch was added by AMD to BLIS on April 2022 (https://github.com/flame/blis/blob/89b7863/docs/ReleaseNotes.md?plain=1#L43-L74)
[WIP] cblas_?gemm_batch OpenMathLib/OpenBLAS#3039 still open

It's easy to add a naive version that for-loops over matrix multiplication, but because all the BLAS libraries use OpenMP and OpenMP doesn't support nesting properly, you can't utilize the new level of parallelism exposed at all. Which:

Led me to start designing a DSL for tensor operations https://github.com/mratsim/laser/tree/master/laser/lux_compiler/core (note: @can-lehmann, in a separate effort, went in actual working code stage with https://github.com/can-lehmann/exprgrad)
Led me to design a multithreading backend, https://github.com/mratsim/weave
Led me to design a matrix multiplication that is faster than OpenBLAS, based on Weave: https://github.com/mratsim/weave/tree/master/benchmarks/matmul_gemm_blas
and the new parallelism level for batch matmul is easy to add.

Which brings to engineering issues. For now Arraymancer doesn't use a custom threadpool because it's a very involved change and I need to port some LAPACK functions as well besides just matrix multiplication, things here: https://github.com/mratsim/Arraymancer/tree/master/src/arraymancer/linear_algebra/helpers

syevr (Symmetric Recursive Eigenvalue Decomposition)
geqrf (General QR factorization)
gesdd (General Singular value Decomposition by Divide & conquer)
getrf (General Pivoted LU factorization)
gelsd (General Least Square Solver by Divide & Conquer)
gesv (General AX = B solver)

So batch matrix multiplication would be very welcome. But probably just start humble with a for loop over normal matrix multiplication.

hlclemson · 2023-12-08T00:56:25Z

I apologize in advance if my question is too basic.

Just out of curiosity, is there any workaround for this issue?

Let say for example, if I want to do something like, L (double dot product) P = L_ijkl P_lk. Can I convert this operation to a loop over rank 2 matrices?

mratsim · 2023-12-08T19:03:04Z

The einsum operator should work, but it would be non-parallelized and slow.

And otherwise you do it like im2col convolution

Arraymancer/src/arraymancer/nn_primitives/fallback/conv.nim

Lines 99 to 103 in 12610a3

    
           for i in 0..<batch_size: #TODO: batch matmul 
        
             im2col(input.atAxisIndex(0, i).squeeze(0), kernel_size, padding, stride, input_col) 
        
             # The following must be done without copy: GEMM will directly write in the result tensor 
        
             output = result.atAxisIndex(0, i).reshape(kernel_col.shape[0], input_col.shape[1]) 
        
             gemm(1.T, kernel_col, input_col, 0.T, output)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiplication of tensors with rank > 2 #582

Multiplication of tensors with rank > 2 #582

jegork commented Jan 7, 2023

lucidrains commented Mar 31, 2023

lucidrains commented Apr 8, 2023

jegork commented May 10, 2023

mratsim commented May 12, 2023 •

edited

hlclemson commented Dec 8, 2023

mratsim commented Dec 8, 2023

Multiplication of tensors with rank > 2 #582

Multiplication of tensors with rank > 2 #582

Comments

jegork commented Jan 7, 2023

lucidrains commented Mar 31, 2023

lucidrains commented Apr 8, 2023

jegork commented May 10, 2023

mratsim commented May 12, 2023 • edited

hlclemson commented Dec 8, 2023

mratsim commented Dec 8, 2023

mratsim commented May 12, 2023 •

edited