Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Using Array API standard for functions implemented using pure Python and NumPy API #15354

Closed
IvanYashchuk opened this issue Jan 4, 2022 · 10 comments
Labels
array types Items related to array API support and input array validation (see gh-18286) duplicate Issues that describe the same problem or that are reported multiple times enhancement A new feature or improvement

Comments

@IvanYashchuk
Copy link

IvanYashchuk commented Jan 4, 2022

Is your feature request related to a problem? Please describe.

SciPy's roadmap includes a section about GPU and distributed array support. It would be great to start moving in this direction and the recent NumPy 1.22 release together with CuPy and Array API standard can help with that. The Array API documentation describes the use case for SciPy here.

Latest NumPy and CuPy releases include Array API compatible modules (numpy.array_api and cupy.array_api). Some other libraries like PyTorch aim to get their main module to be Array API compatible soon.

Describe the solution you'd like.

Array objects from the Array API compatibles libraries have a method to retrieve its module: __array_namespace__(). This module needs to be used as a numpy/np replacement. Since the array instances carry the module information SciPy codebase doesn't need to introduce any additional dependencies.

Current way of using NumPy

import numpy as np

def current_scipy_func(input):
    # use `np` to do computation on `input`
    input_array = np.asarray(input)
    ...

should be changed to (note there's no import numpy as np)

def new_scipy_func(input):
    xp = get_namespace(input)
    # now use `xp` instead of `np`
    input_array = xp.asarray(input)
    ...

See NEP 47 for a possible implementation of the get_namespace function: https://numpy.org/neps/nep-0047-array-api-standard.html#appendix-a-possible-get-namespace-implementation.
In order to preserve the current behavior when passing regular NumPy arrays, the get_namespace function in SciPy should return the result of the __array_namespace__() method of the input array if the method exists or numpy otherwise.

The first candidates for the Array API adoption are:

  • Signal module
    All functions except max_sen_len, sosfilt, lombscargle, upfirdn and Peak finding algorithms that use Cython.
  • Special module
    All non-ufunc functions.
    non_ufuncs = [f for f in scipy.special.__all__ if not isinstance(getattr(scipy.special, f), numpy.ufunc)]

Describe alternatives you've considered.

No response

Additional context (e.g. screenshots, GIFs)

Purpose and scope of the Array API standard: https://data-apis.org/array-api/latest/purpose_and_scope.html
Blog post about the demo of using Array API with SciPy signal module: https://labs.quansight.org/blog/2021/10/array-libraries-interoperability/
NEP 47 — Adopting the array API standard https://numpy.org/neps/nep-0047-array-api-standard.html
Previous discussion on a related topic: #10204

@IvanYashchuk IvanYashchuk added the enhancement A new feature or improvement label Jan 4, 2022
@tupui
Copy link
Member

tupui commented Jan 4, 2022

👍 sounds like an easy change for us, it's like our seed wrapper. With all the solutions we will have to speed things up, what would be the guidelines on when to use Pythran, Cython or this? Or can Pythran, Cython work with other backends??

@tupui
Copy link
Member

tupui commented Jan 6, 2022

I am wondering something. If there is support in NumPy for other backend. How such interface would work? Wouldn't np.asarray already be able to handle something coming from CuPy or other?

@AnirudhDagar
Copy link
Member

I'd like to help on this issue @IvanYashchuk. When we decide to go ahead, I feel it would be better to open up a tracking issue (maybe this issue can already become that) mentioning what modules/functions are planned to be Array API Compatible and what is already compatible with Array API in SciPy.

It will also be useful to document the Array API supported modules/functions somewhere in SciPy for the users to actually know what's possible with such methods. Maybe adding more examples to showcase such interoperable possibilities?

How such interface would work? Wouldn't np.asarray already be able to handle something coming from CuPy or other?

@tupui np.asarray would actually convert something coming from CuPy to a numpy array. That's not what we want here. With array API and the xp.asarray syntax we'll be able to handle the passed arguments in the array library of choice (NumPy, CuPy, PyTorch etc.). I'd highly suggest reading the links shared in the issue for further clarity.

@tupui
Copy link
Member

tupui commented Jan 6, 2022

@tupui np.asarray would actually convert something coming from CuPy to a numpy array. That's not what we want here. With array API and the xp.asarray syntax we'll be able to handle the passed arguments in the array library of choice (NumPy, CuPy, PyTorch etc.). I'd highly suggest reading the links shared in the issue for further clarity.

Ok I was not sure that NumPy would do this. If the backend was set to be CuPy, I though np.asarray would create an array with the input and respect the backend, so move it to the GPU for instance. This way we would not need to do anything.

@AnirudhDagar
Copy link
Member

AnirudhDagar commented Jan 6, 2022

Also, it's not only .asarray, there will be many more Array API Standard functions that will need to be refactored from np. -> xp. for the whole method to be Array API Compatible.

@IvanYashchuk
Copy link
Author

👍 sounds like an easy change for us, it's like our seed wrapper. With all the solutions we will have to speed things up, what would be the guidelines on when to use Pythran, Cython or this? Or can Pythran, Cython work with other backends??

I'd say this is orthogonal to Pythran and Cython use. If Cython code makes use of the NumPy's array internal memory layout, then certainly it will not be portable to be able use Python Array APU. Using __array_namespace__() should be recommended over the use of pure Python+NumPy, it is mostly a portability gain (portability across array libraries and devices they work with). It's a complementary thing that with GPU libraries we might get a performance boost.

I am wondering something. If there is support in NumPy for other backend. How such interface would work? Wouldn't np.asarray already be able to handle something coming from CuPy or other?

A subset of NumPy API works with non-NumPy array objects (it works with CuPy and Dask) using dispatch mechanisms that are described in NEP 13, NEP 18 and NEP 35. __array_namespace__ and the Array API standard is an evolution of the ideas described in NEP 37.

@IvanYashchuk
Copy link
Author

It will also be useful to document the Array API supported modules/functions somewhere in SciPy for the users to actually know what's possible with such methods. Maybe adding more examples to showcase such interoperable possibilities?

For sure there's going to be a lot of work needed on the documentation side.

I feel it would be better to open up a tracking issue (maybe this issue can already become that) mentioning what modules/functions are planned to be Array API Compatible and what is already compatible with Array API in SciPy.

I'm going to open PRs porting a subset of scipy.signal functionality first and then update the issue description with a porting plan for other functions. I will ping you, Anirudh, when I have the first PR opened.

@AnirudhDagar
Copy link
Member

Sounds great! Looking forward to working on this.

@tupui
Copy link
Member

tupui commented Feb 8, 2022

Cross referencing a similar discussion in SciKit-Learn scikit-learn/scikit-learn#22352

@lucascolley
Copy link
Member

I'm closing this now as superseded by gh-18286. gh-18867 is tracking support for this (both using the standard for pure Python + NumPy functions and attempting to convert to and from NumPy using asarray when compiled code is hit).

@lucascolley lucascolley closed this as not planned Won't fix, can't repro, duplicate, stale Dec 11, 2023
@lucascolley lucascolley added array types Items related to array API support and input array validation (see gh-18286) duplicate Issues that describe the same problem or that are reported multiple times labels Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array types Items related to array API support and input array validation (see gh-18286) duplicate Issues that describe the same problem or that are reported multiple times enhancement A new feature or improvement
Projects
None yet
Development

No branches or pull requests

4 participants