Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: linalg: array library interoperability #6

Closed
lucascolley opened this issue Aug 15, 2023 · 1 comment
Closed

ENH: linalg: array library interoperability #6

lucascolley opened this issue Aug 15, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@lucascolley
Copy link
Owner

lucascolley commented Aug 15, 2023

Is your feature request related to a problem? Please describe.

Plans for addressing scipy#18286 in scipy.linalg.

Currently, scipy.linalg only supports the use of NumPy arrays. Work towards interoperability has the potential to bring large performance improvements now and, with adoption of the array API standard, will bring compatibility with any future library which complies with the standard.

linalg is an array API standard extension, which means that supporting it will bring automatic dispatching for all standard functions. array-api-compat has implemented linalg wrappers for numpy, cupy and torch, so that automatic dispatching is available today.

For those functions which use compiled code and are not in the standard, non-CPU device support will be limited to start off with. From the RFC:

When a non-NumPy array type sees compiled code in SciPy (which tends to use the NumPy C API), we have a couple of options:

  1. dispatch back to the other library (PyTorch, CuPy, etc.).
  2. convert to a NumPy array when possible (i.e., on CPU via the buffer protocol, DLPack, or array), use the compiled code in question, then convert back.
  3. don't support this at all

I'll note that (1) is the long-term goal; how to implement this is out of scope for this proposal - for more on that, see the Appendix section. For now we choose to do (2) when possible, and (3) otherwise. Switching from that approach to (1) in the future will be backwards-compatible.

This does not mean that GPU support is off-limits for all of these functions right now. Firstly, if we can write non-standard functions in terms of standard functions, then they will become compatible with all compliant libraries. Secondly, we have the option to call to CuPy and PyTorch for some functions. While this is not the general, library-agnostic solution of (1) - the "long-term goal" - it may be worth doing this to unlock GPU support for some important functions.

Describe the solution you'd like.

A. Conform to the standard extension.

This should be (almost) possible in one PR, similar to scipy#19005 in fft. This involves validating array inputs with array_namespace and using xp.linalg, if it is available, for all standard functions. If it is not available, a conversion to NumPy is attempted. For NumPy arrays, the existing SciPy implementation is used.

For completeness and ease of use, I propose to add the standard functions which are not currently in scipy. Therefore, if a user wants to write array-agnostic code using the standard extension, they can just use SciPy, without going back to the array library for the functions. This will not take many lines of code as it is just dispatching to xp.linalg, or attempting to use functions from np.

I say almost possible since, to fully conform, the standard requires that functions on matrices support “batching” (i.e., the ability to perform the operation over a “stack” of matrices). There is also a discrepancy between the default mode of qr (full/complete vs economical/reduced) in the APIs. This, and potentially other similar issues, may be unsuitable to address here. These issues are really details of the API though, and lots of interoperability will still be achieved, despite not completely conforming immediately.

B. Re-write some functions to be array-agnostic.

Any function which does not use compiled-code can be included here. Furthermore, any functions which can be rewritten in terms of standard functions will be included. There may be overlap here with the work to convert functions from Fortran to other languages (scipy#18566), in the case that anything is converted to pure Python. As a result, this point may become multiple PRs, with future work coming as more compiled code is eliminated.

This (first PR) will look similar to the cluster bits of scipy#18668, and a follow-up PR to come for fft.

C. Call to CuPy and PyTorch.

We could also add calls to CuPy and PyTorch for any functions which they share with SciPy, and to which points A and B do not apply. This would ultimately be superseded by a dispatching mechanism, but there are no active plans for that right now.

Which functions in each PR?

For A, the standard functions which are in SciPy are:
cholesky, det, eigh, eighvalsh, inv, pinv, qr, solve, svd and svdvals.

The standard functions which would be added are:
cross, diagonal, matmul, matrix_norm (can use SciPy's norm), matrix_power, matrix_rank, matrix_transpose, outer, slogdet, tensordot, trace, vecdot, vector_norm (can use SciPy's norm).

These functions to be added all either have implementations in np or np.linalg, or are aliases of other standard functions, so they will immediately work with NumPy arrays.

For B, the Special Matrices functions look doable. There may be others, but I haven't had a thorough look through.

For C, some functions which have been suggested are solve_triangular, lu, lu_solve and lstsq.

Describe alternatives you've considered.

The number of PRs to split this work into is undecided. If there are multiple PRs, the order of them is also undecided.

If this all looks reasonable, I plan work on a PR for A. The main question for this PR would be whether or not to touch non-standard functions. The advantage of this would be to have consistent validation with array_namespace and behaviour with arrays coercible by np.asarray, across the whole submodule. The disadvantage of this would be additional review overhead, where the same functions would be revisited in PRs for B and C. I have no preference, so will do whatever the maintainers prefer. Regardless of when the work occurs, I think that it makes sense to have this consistent validation and behaviour eventually.

Overall, I am not familiar with scipy.linalg yet, so I may be missing a lot! Please feel free to give any corrections or suggestions.

References

@lucascolley lucascolley added the enhancement New feature or request label Aug 15, 2023
@lucascolley
Copy link
Owner Author

moved to scipy#19068.

Repository owner locked and limited conversation to collaborators Aug 20, 2023
Repository owner unlocked this conversation Aug 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant