Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiplication of tensors with rank > 2 #582

Open
jegork opened this issue Jan 7, 2023 · 6 comments
Open

Multiplication of tensors with rank > 2 #582

jegork opened this issue Jan 7, 2023 · 6 comments

Comments

@jegork
Copy link

jegork commented Jan 7, 2023

Hello!

Currently, * operator only supports matrix-matrix and matrix-vector multiplication. Are there any plans to add support for batch matrix-matrix multiplication? It would be really useful for stuff like Attention which I am trying to implement

Thanks!

@lucidrains
Copy link

i'm interested in this as well

@lucidrains
Copy link

i will be the first to submit a PR for GPT in Arraymancer, if all the pieces are available to build attention

@jegork
Copy link
Author

jegork commented May 10, 2023

@lucidrains I actually requested this as I started writing a PR to add attention in arraymancer myself 😅, however I am a big fan of your work and it would be great if you have to possibility to add attention

Regarding this issue, I might have time to implement it, however I have little knowledge about possible implementations

@mratsim
Copy link
Owner

mratsim commented May 12, 2023

Sorry, I was off for a couple months and didn't check my long list of Github notifications.

batch matrix multiplication is something I've wanted to implement like 4 years ago. My main issue is that some libraries provide it:

but not OpenBLAS or BLIS:

It's easy to add a naive version that for-loops over matrix multiplication, but because all the BLAS libraries use OpenMP and OpenMP doesn't support nesting properly, you can't utilize the new level of parallelism exposed at all. Which:

Which brings to engineering issues. For now Arraymancer doesn't use a custom threadpool because it's a very involved change and I need to port some LAPACK functions as well besides just matrix multiplication, things here: https://github.com/mratsim/Arraymancer/tree/master/src/arraymancer/linear_algebra/helpers

  • syevr (Symmetric Recursive Eigenvalue Decomposition)
  • geqrf (General QR factorization)
  • gesdd (General Singular value Decomposition by Divide & conquer)
  • getrf (General Pivoted LU factorization)
  • gelsd (General Least Square Solver by Divide & Conquer)
  • gesv (General AX = B solver)

So batch matrix multiplication would be very welcome. But probably just start humble with a for loop over normal matrix multiplication.

@hlclemson
Copy link

I apologize in advance if my question is too basic.

Just out of curiosity, is there any workaround for this issue?

Let say for example, if I want to do something like, L (double dot product) P = L_ijkl P_lk. Can I convert this operation to a loop over rank 2 matrices?

@mratsim
Copy link
Owner

mratsim commented Dec 8, 2023

The einsum operator should work, but it would be non-parallelized and slow.

And otherwise you do it like im2col convolution

for i in 0..<batch_size: #TODO: batch matmul
im2col(input.atAxisIndex(0, i).squeeze(0), kernel_size, padding, stride, input_col)
# The following must be done without copy: GEMM will directly write in the result tensor
output = result.atAxisIndex(0, i).reshape(kernel_col.shape[0], input_col.shape[1])
gemm(1.T, kernel_col, input_col, 0.T, output)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants