Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: linalg: array library interoperability #19068

Open
lucascolley opened this issue Aug 15, 2023 · 23 comments
Open

ENH: linalg: array library interoperability #19068

lucascolley opened this issue Aug 15, 2023 · 23 comments
Assignees
Labels
array types Items related to array API support and input array validation (see gh-18286) scipy.linalg

Comments

@lucascolley
Copy link
Member

lucascolley commented Aug 15, 2023

Is your feature request related to a problem? Please describe.

Plans for addressing #18286 in scipy.linalg.

Currently, scipy.linalg only supports the use of NumPy arrays. Work towards interoperability has the potential to bring large performance improvements now and, with adoption of the array API standard, will bring compatibility with any future library which complies with the standard.

linalg is an array API standard extension, which means that supporting it will bring automatic dispatching for all standard functions. array-api-compat has implemented linalg wrappers for numpy, cupy and torch, so that automatic dispatching is available today.

For those functions which use compiled code and are not in the standard, non-CPU device support will be limited to start off with. From the RFC:

When a non-NumPy array type sees compiled code in SciPy (which tends to use the NumPy C API), we have a couple of options:

  1. dispatch back to the other library (PyTorch, CuPy, etc.).
  2. convert to a NumPy array when possible (i.e., on CPU via the buffer protocol, DLPack, or array), use the compiled code in question, then convert back.
  3. don't support this at all

I'll note that (1) is the long-term goal; how to implement this is out of scope for this proposal - for more on that, see the Appendix section. For now we choose to do (2) when possible, and (3) otherwise. Switching from that approach to (1) in the future will be backwards-compatible.

This does not mean that GPU support is off-limits for all of these functions right now. Firstly, if we can write non-standard functions in terms of standard functions, then they will become compatible with all compliant libraries. Secondly, we have the option to call to CuPy and PyTorch for some functions. While this is not the general, library-agnostic solution of (1) - the "long-term goal" - it may be worth doing this to unlock GPU support for some important functions.

Describe the solution you'd like.

A. Use the standard extension.

This should be (almost) possible in one PR, similar to #19005 in fft. This involves validating array inputs with array_namespace and using xp.linalg, if it is available, for all standard functions. If it is not available, a conversion to NumPy is attempted. For NumPy arrays, the existing SciPy implementation is used.

B. Re-write some functions to be array-agnostic.

Any function which does not use compiled-code can be included here. Furthermore, any functions which can be rewritten in terms of standard functions will be included. There may be overlap here with the work to convert functions from Fortran to other languages (#18566), in the case that anything is converted to pure Python. As a result, this point may become multiple PRs, with future work coming as more compiled code is eliminated.

This (first PR) will look similar to the cluster bits of #18668, and a follow-up PR to come for fft.

C. Call to CuPy and PyTorch.

We could also add calls to CuPy and PyTorch for any functions which they share with SciPy, and to which points A and B do not apply. This would ultimately be superseded by a dispatching mechanism, but there are no active plans for that right now.

Which functions in each PR?

For A, the standard functions which are in SciPy are:
cholesky, det, eigh, eighvalsh, inv, pinv, qr, solve, svd and svdvals.

The standard functions which would be added are:
cross, diagonal, matmul, matrix_norm (can use SciPy's norm), matrix_power, matrix_rank, matrix_transpose, outer, slogdet, tensordot, trace, vecdot, vector_norm (can use SciPy's norm).

These functions to be added all either have implementations in np or np.linalg, or are aliases of other standard functions, so they will immediately work with NumPy arrays.

For B, the Special Matrices functions look doable. There may be others, but I haven't had a thorough look through.

For C, some functions which have been suggested are solve_triangular, lu, lu_solve and lstsq.

Describe alternatives you've considered.

The number of PRs to split this work into is undecided. If there are multiple PRs, the order of them is also undecided.

If this all looks reasonable, I plan work on a PR for A. The main question for this PR would be whether or not to touch non-standard functions. The advantage of this would be to have consistent validation with array_namespace and behaviour with arrays coercible by np.asarray, across the whole submodule. The disadvantage of this would be additional review overhead, where the same functions would be revisited in PRs for B and C. I have no preference, so will do whatever the maintainers prefer. Regardless of when the work occurs, I think that it makes sense to have this consistent validation and behaviour eventually.

Overall, I am not familiar with scipy.linalg yet, so I may be missing a lot! Please feel free to give any corrections or suggestions.

References

@ilayn
Copy link
Member

ilayn commented Aug 15, 2023

I am a bit on the fringes for this but having some familiarity with linalg library, in my opinion array API is focusing on the wrong functionality to start with. The main argument is the following;

In this day and age, the algorithms are kinda sorta established. There are not many ways to do LU factorization or cholesky implementation. The compiled code speed varies however the algorithms are quite static and very rarely something else comes up. The performance since the last decade lies in the structure discovery and dispatching to the correct flavor of the algorithms.

In fact this is the key to many algorithms that you quickly run over the array to discover structure and then dispatch to triangular, hermitian, hessenberg, bidiagonal etc. This is why we introduced scipy.linalg.bandwidth, scipy.linalg.ishermitian, and so on to have relatively quicker way to discover. Regardless of the architecture these are the essential ones that almost all linalg should start from. In that regard, I don't find the Array API standard linalg extensions doing the right standardization. They are only standardizing the API we currently have but not what we are trying to get to.

However on the other hand, these have to be as fast as possible to get rid of the additional overhead. In other words, scipy.linalg.solve has to be faster than numpy.linalg.solve (which sends every array to the same algorithm) if there is triangular structure. The time saving for using a triangular structure should not be wasted on checking which array vendor is provided and all other related checks. Otherwise there is no point in checking the structure and we will be forever slow with no possibility of speeding up due to this overhead.

We already suffer from this with check_finite=True keyword in almost all linalg funcs. The slowdown is pretty much 2x due to NumPy elementwise checks with heavy NumPy overhead and we are hoping to get rid of it by quick exit functionality and shorter path to compiled code which has a rather more aggressive run-over-array behavior specifically written for SciPy.

Unfortunately we also send things to LAPACK internally and that also adds f2py overhead eating away all the advantage we gained from array structure to nanny the array to Fortran contiguous memory layout. Hardly ever, a user is creating a Fortran array hence this is pretty much a two array copies; one going down to LAPACK and one for receiving the result back from f2py.

Therefore the utmost important things we need to see is these helper function implementations to support array API. Otherwise this is just going to be a case-switch for if this library, use this function because these are going to be used extensively inside linalg.

Especially, check_finite and lower keywords should disappear at some point which are purely LAPACK artifacts and need to be gotten rid of. We are already considering switching to homemade getrf implementations just like Julia and Rust folks already did because of the reduced performance of OpenBLAS/MKL for approximately n<100. There is at least 4x performance we are leaving at the table by not attending to this but we are pretty much full maintainer capacity right now and things are going slow.

To that end we already started some limited effort to remove Fortran code as much as we can in SciPy in #18566, like you mentioned. And once we are close to having those removed we hope to switch to more compiled code without obfuscating the code too much.

@tylerjereddy tylerjereddy added array types Items related to array API support and input array validation (see gh-18286) scipy.linalg labels Aug 15, 2023
@tylerjereddy
Copy link
Contributor

Re: performance, do we not have code paths that traverse DGEMM/matmul and friends that would benefit from GPU-tiled matrix multiplication on huge arrays? Isn't the performance difference orders of magnitude vs. CPU there? I thought that was a big reason that GPU dispatch for matmul empowers sensible machine learning times on device vs. host. I suppose torch et al. probably do something powerful on GPU for matrix multiply with large arrays that we don't get from host OpenBLAS? I don't know how the tradeoff for that particular example compares with the other matters discussed above though.

For completeness and ease of use, I propose to add the standard functions which are not currently in scipy. Therefore, if a user wants to write array-agnostic code using the standard extension, they can just use SciPy ...

This decision strikes me as less obvious--my guess would have been that we'd try to improve the performance of our existing linalg surface area by supporting the array API under the hood, but expanding our API surface area to match the actual array backends is a bit less clear as the right direction to me, perhaps because it feels like confounding being an array API "provider" vs. consumer/supporter.

@rgommers
Copy link
Member

Therefore the utmost important things we need to see is these helper function implementations to support array API.

I think that is a good point. I see you moved that conversation to data-apis/array-api#675, so let's keep this part of the conversation there.

To that end we already started some limited effort to remove Fortran code as much as we can in SciPy in #18566, like you mentioned.

Yes, we should be careful here to not interfere with the efforts of converting Fortran code to Python/Cython/etc. I think it's a matter of @lucascolley avoiding to touch those functions. And in case there's overlap, let's help out and hopefully push the Fortran conversion forward faster.

I am a bit on the fringes for this but having some familiarity with linalg library, in my opinion array API is focusing on the wrong functionality to start with.

I think you bring up a number of good points around performance of functions in scipy.linalg and potential avenues to improve that. But it looks to me like that's an orthogonal topic. The goal here is to be able to write/run code with another library like CuPy or PyTorch. scipy.linalg functionality is used heavily, also in other SciPy submodules. Hence it having support for the array API, CuPy, PyTorch & co is very useful - without it any other code that uses linalg functionality is also blocked from using non-numpy arrays. As Tyler points out, there may be significant performance gains in some cases too, but even without that, it's very useful to do the work that @lucascolley proposes here.

Isn't the performance difference orders of magnitude vs. CPU there? I thought that was a big reason that GPU dispatch for matmul empowers sensible machine learning times on device vs. host.

There is indeed. Even on CPU, some of the merged scikit-learn work showed that there's significant performance gains with PyTorch (simply comparing in envs that install wheels from PyPI), because it parallellizes more and builds again MKL, while we bundle OpenBLAS which is significantly slower on average.

@lucascolley
Copy link
Member Author

This decision strikes me as less obvious--my guess would have been that we'd try to improve the performance of our existing linalg surface area by supporting the array API under the hood, but expanding our API surface area to match the actual array backends is a bit less clear as the right direction to me, perhaps because it feels like confounding being an array API "provider" vs. consumer/supporter.

This is a good point and I agree that this wouldn't be the obvious choice. I still think that there would be some benefit to providing all of the standard functionality under the SciPy umbrella, but this is certainly not something that the standard suggests that consuming libraries should do.

It depends on the magnitude of the drawbacks. It certainly isn't much work to add these functions. There may be large concerns in terms of maintenance or API-bloat, in which case it totally makes sense to not add these.

@ilayn
Copy link
Member

ilayn commented Aug 16, 2023

I think you bring up a number of good points around performance of functions in scipy.linalg and potential avenues to improve that. But it looks to me like that's an orthogonal topic. The goal here is to be able to write/run code with another library like CuPy or PyTorch. scipy.linalg functionality is used heavily, also in other SciPy submodules. Hence it having support for the array API, CuPy, PyTorch & co is very useful - without it any other code that uses linalg functionality is also blocked from using non-numpy arrays. As Tyler points out, there may be significant performance gains in some cases too, but even without that, it's very useful to do the work that @lucascolley proposes here.

Indeed, but the part that I don't understand is the Array API definitions probably. Because there is solve, 'LU' and other details. Does this mean that if an array comes in I basically use compatible_vendor.namespace.solve(arr, b) ala numpy.linalg.solve or I use compatible_array.solve(b). ala ndarray.solve()

I am not even really sure I understand correctly why Array API has a linalg section.

@ilayn
Copy link
Member

ilayn commented Aug 16, 2023

I suppose torch et al. probably do something powerful on GPU for matrix multiply with large arrays that we don't get from host OpenBLAS?

That is correct. But it gets slightly more complicated for matrix decompositions and so on. SciPy doesn't have much of array manipulations or ufuncs that operate over the array. Hence matmul's or fast transpose add's etc. are probably within the array vendor functionality catalogue (or should be).

We typically send it to somewhere (compiled code land) and get the result. But linalg section is confusing me in that vendors are also providing linalg libs or what SciPy's role would be. Hence fully agree with your remark

perhaps because it feels like confounding being an array API "provider" vs. consumer/supporter.

@lucascolley
Copy link
Member Author

lucascolley commented Aug 16, 2023

Does this mean that if an array comes in I basically use compatible_vendor.namespace.solve(arr, b) ala numpy.linalg.solve or I use compatible_array.solve(b). ala ndarray.solve()

The former; they are functions of the namespace (xp.linalg).

For a slightly more concise example than solve, here is what inv could look like, where _inv is the current SciPy implementation:

xp = array_namespace(a)
if is_numpy(xp):
    return _inv(a, overwrite_a=overwrite_a, check_finite=check_finite)
if overwrite_a:
    raise ValueError(arg_err_msg("overwrite_a"))
if not check_finite:
    raise ValueError(arg_err_msg("check_finite"))
if hasattr(xp, 'linalg'):
    return xp.linalg.inv(a)
a = np.asarray(a)
return xp.asarray(_inv(a))

(Support for parameters which are not in the standard could be looked into, but here we raise if a non-standard parameter is provided with a non-NumPy array.)

@ilayn
Copy link
Member

ilayn commented Aug 16, 2023

I am feeling a bit lost again. This is what we discussed extensively in the uarray proposal too. SciPy already has a linalg library. Why would I mediate another library array going into another library linalg function? Whoever is calling can also use xp.linalg.inv(a) directly.

Let's forget about this linalg part; Array API, as I understand it, has Array standardization right. All vendors agree that arrays should have x, y, z properties and A, B, C methods. That's all cool. Then if the consumers like SciPy, when they are implementing stuff, they can follow the standard if they want to support these vendors through Array API. So I keep an eye on the standard, does it have Transpose? yes, cool. Does it have something like norm? Nice yes cool again. Does it have isfinite? No? OK then I need to find a way to do that.

and so on.

And independent from where it is coming from I can push it to scipy.linalg.inv if the implementation allows me to do so. This makes sense to me and indeed we should as much as we can.

But calling other linalg libraries makes no sense to me. This is exactly where we left off the previous discussion and seems like we are getting back to it again.

@rkern
Copy link
Member

rkern commented Aug 16, 2023

One reason to have scipy.linalg functions use their xp.linalg counterparts when given not-ndarrays is that it allows the rest of scipy to be upgraded to be Array API-compatible while still using scipy.linalg-accelerated implementations in the ndarray case. If they are rewritten to use the xp.linalg API, then they will get the np.linalg implementations instead of scipy.linalg implementations.

With our official numpy binaries, that might not be so bad since we put good effort into using accelerated linear algebra there, but I suspect there are still places where scipy.linalg shines.

@ilayn
Copy link
Member

ilayn commented Aug 16, 2023

I fully agree but why is this taking place under SciPy code? Let me put it another way, is every consumer is going to jump through the hoops for torch or jax etc. basically all edge cases just to implement array api compatible code?

I think this is something I am not able to convey the message clearly. In other words, the consumer is orchestrating the code path. This doesn't seem like an array API but a separate linalg API. I don't think any other library would actually do this but only SciPy.

This is why I am typically emphasizing the need for a consumer model before scipy gets into this.

@rkern
Copy link
Member

rkern commented Aug 16, 2023

I agree that it is likely that only scipy (and specifically scipy.linalg and scipy.fft) will do this kind of dance. numpy occupies a unique space in the Array API ecosystem and the Python scientific computing ecosystem more generally, and scipy has always had a unique relationship to numpy. Specifically, scipy.linalg shadows numpy.linalg, and many functions inside of scipy itself use the scipy.linalg versions (and should continue to do so whenever given a true ndarray). But if we want as much of scipy as possible to be xp-agnostic (setting aside scipy.linalg considered by itself), then we either have to implement all of those functions using xp.linalg and give up on the scipy.linalg performance in the most common case (ndarray inputs) OR do a dance in each of those functions to pick between xp.linalg and scipy.linalg OR contain the xp.linalg-or-native dance inside of the scipy.linalg functions. The last choice is probably the best bang for buck.

@ilayn
Copy link
Member

ilayn commented Aug 16, 2023

So if we enumerate them

  1. implement all of those functions using xp.linalg and give up on the scipy.linalg performance in the most common case
  2. do a dance in each of those functions to pick between xp.linalg and scipy.linalg
  3. contain the xp.linalg-or-native dance inside of the scipy.linalg functions

The only remaining option is that

  1. we don't support the linalg extension of Array API and only use the main body of it.

Because I really don't see the benefit of it for SciPy (yet!). And frankly, I don't understand why SciPy should be the arbiter of this to start with. I am full in for the Array API if I understand it correctly.

# xp = array_namespace(a)
arr = xp.array([1, 2, 3], [4, 5, 6])  # Whatever xp is

# following just works because the array has all the necessary
# methods used in the inv implementation
# And SciPy tries its best to use only those limited set of 
# functionalities from Array API
arr_inv = scipy.linalg.inv(arr)

This is clearly a superior standardization that generalizes the ndarray of NumPy. But the following is working backwards pushing implementation details to consumers;

xp = array_namespace(a)
arr = xp.array([1, 2, 3], [4, 5, 6])  # Whatever xp is

# some code that checks if the kwargs are compatible
....

#then the dance we mentioned above
if numpy:
    arr_inv = scipy.linalg.inv(arr)
elif it is coming from some xp with linalg:
    call their linalg
elif gpu stuff:
    perform some host device dance
....
else:
    TypeError("don't know what to do with this input")

Why does SciPy has to host what other libraries do while other libraries don't need to care anything SciPy does? In other words, why should SciPy has to adopt this? It gains nothing other than a tiny bit of overhead and quite some if-else switches, because its most use case is NumPy (overall speaking). Moreover, SciPy binds itself to this switching code and its maintenance with 100% dependency to other libraries' implementation velocity. Have a look at Tyler's welch implementation in #19001. I consider it pretty convoluted as per my remarks. And also it seems like we didn't settle on the basic functionality either (as one piece of anectodal evidence; data-apis/array-api#627)

Probably I am not able to peek from your perspective regarding the unique positions because this standard's goal is to remove that uniqueness (if I understand correctly) hence I would appreciate a bit more motivation to do this within SciPy. My current understanding is that linalg extension of this API is over-reaching to implementers and consumers which should not be the case. Gaining great performance just by shifting to GPU is great but if we are going to orchestrate the code such that it goes to another library then it is much easier to use that library already with its linalg extensions.

It is very much the same behavior if we don't change anything and let the xp.linalg namespace not to be used at all. In my first code example that would be wonderful to have; SciPy just consumes xp.array and its methods and tries really hard not to get out of those core functionalities and in return the array knows its own implementation properties.

Actually I am looking at ways to do array api conforming linalg already. That's maximal bang with the same buck such that we support all array inputs, and we don't bother if some library implements linalg or not.

@rkern
Copy link
Member

rkern commented Aug 16, 2023

My understanding is that the proposal being made in this issue (the one I labeled as the best bang for buck) is exactly the one that enables the first code example that you mark out as superior. The dance is hidden inside of scipy.linalg.inv() and not exposed to consumers. The proposal here is to avoid having consumers do that dance.

@ilayn
Copy link
Member

ilayn commented Aug 17, 2023

In case I didn't emphasize it, let me clarify; I consider SciPy to be a consumer and thus we, too, should not do this dance. I guess that's crucial point. In the terminology of other Array API issues, consumers are the array consumers not the end users. Or at least that's how I understood it. So there should be no dance in anywhere.

My question is why SciPy is the arbiter of this dance at all if it is a consumer.

@ilayn
Copy link
Member

ilayn commented Aug 17, 2023

Ask differently, who can implement linalg extension of Array API without this dance?

@rkern
Copy link
Member

rkern commented Aug 17, 2023

scipy.linalg is not an implementation of xp.linalg. Likely no implementation of xp.linalg will do The Dance. For other xp array objects, there is basically only one preferred linear algebra implementation, and that will be implemented as xp.linalg. xp.linalg.* functions won't be given non-xp array objects (because one gets the xp namespace object from the array object itself).

Because of numpy and scipy's long history together, we have 2 linalg libraries for ndarray objects, one shadowing the other. We (at least the rest of scipy) use the scipy.linalg version a lot because it's generally built with the faster versions of the LAPACK calls or a faster LAPACK in general. Our use of scipy.linalg functions is a blocker to making much of the rest of scipy xp-agnostic.

To highlight a point of agreement, at least between us two, I agree that we should not be doing backend-specific checks likes the ones you noted in #19001, so I'd strike the elif gpu stuff: clause in your example. For me, that's my line as to whether or not it's worth making one of our functions xp-agnostic. If it's not part of the standard or one implementation of the standard is broken, then we don't support it. That would indeed tightly couple us to arcane details of the implementation state of a whole host of other libraries, and that's too much of a burden, IMO. The Array API standard should be doing the cross-backend work.

The proposal does not involve backend-specific checks other than checking for true ndarrays. It constrains The Dance (sans-elif gpu stuff:) to just those functions in scipy.linalg that have standardized xp.linalg equivalents. No other part of scipy need do The Dance; they can just call scipy.linalg.inv() freely and rest assured that it will still be xp-agnostic AND get the best performance for ndarray inputs too. The Dance is not an all-pervasive pattern for proper use or implementation of xp.linalg. It's a response to a unique opportunity afforded by the existence of the faster scipy.linalg implementations for ndarray inputs.

Now, the ideal scenario would be that the np.linalg functions, which are the basis of the xp.linalg implementation for ndarray implementation of the Array API, would just be fast enough that xp-agnostic code could just use xp.linalg explicitly and unconditionally. Maybe they are good enough; we should check! Or maybe we should invest some effort into making them so. But for me, implementing The Dance inside of a few scipy.linalg functions is a cheap and effective way to unblock getting much of the rest of scipy xp-agnostic without sacrificing performance.

@ilayn
Copy link
Member

ilayn commented Aug 18, 2023

Much clearer thank you for the clarification. I think I completely agree though certain details, I just don't have enough information about. But they are not much relevant for this discussion.

So I am kinda sorta, thinking about the matfuncs when array API's are discussed. Here linalg uses itself quite a lot; say it goes roughly the following; I am just making things up so following is not necessarily the real function body, but many are like this

def sqrtm(A):

    # Using the xp.linalg and np.linalg by their namespace to cut it short
    if np:
        q, z = scipy.linalg.qz(A)
    elif xp:
        # possibly different API
        q, z, _ = xp.linalg.qz(A, some_kwarg=True)
    else:
         ValueError('Don't know how to handle A')



    # then do bunch of stuff with q and z
    # maybe call some cythonized code
    q_2 = some_cythonized_func(q)
    ....

    # Now say we need to call eigh
    if np:
        L, V = scipy.linalg.eigh(A, subset_by_index=[2, 5])
    elif xp:
        # possibly different API
        L, V, _ = xp.linalg.eigh(A, some_kwarg=True)
        # sorting is left open in the spec
        # https://data-apis.org/array-api/latest/extensions/generated/array_api.linalg.eigh.html
        L = xp.sort(L, descending=True)
    else:
         ValueError('Don't know how to handle A')



     # more code possibly with more scipy.linalg/xp.linalg cases
    ....
    return M

Alternatively, we can collect things under big sub functions

def sqrtm(A):

    if np:
        # scipy original implementation
        q, z = scipy.linalg._sqrtm(A)

    elif xp:

        q, z, _ = xp.linalg.qz(A, some_kwarg=True)

        # the array api compatible implementation
        ....
        L, V, _ = xp.linalg.eigh(A, some_kwarg=True)
        L = xp.sort(L, descending=True)
        ....

    else:
         ValueError('Don't know how to handle A')

    return M

This is what I can't wrap my head around yet. Seems to me that SciPy is selected as the common consumer to every vendor (well like you said, let's say we convinced folks about the GPU stuff but, Matt, at least, was impressed with the GPU results so that seems to be the killer feature). And we need to take care of array API quite comprehensively.

Maybe we can move array API namespace inside linalg with matching signatures? Say scipy.linalg.array_api.<func> where func signature is adjusted to array API linalg extension standard. But that seems to open other cans of worms. So I'm not sure how to proceed.

@rkern
Copy link
Member

rkern commented Aug 18, 2023

I do think that a separate namespace of xp.linalg-compliant shim functions that do The Dance might be a reasonable option. I was going to suggest it as well. In this scenario, we leave all of the scipy.linalg functions alone; they only do ndarray calculations on the CPU just as normal. But the shim functions will check for true ndarrays and call the scipy.linalg version in that case and otherwise defer to xp.linalg. That namespace might also be a place for xp-agnostic implementations of functions that are in scipy.linalg but not in the xp.linalg standard, much like the xp-agnostic implementation of cov() that's needed to unblock higher-level functions.

Among other reasons, keeping the xp.linalg-compliant shims elsewhere would avoid some of the confusion over the signature differences between the xp.linalg standard and the implementations in scipy.linalg. If we tried to make the scipy.linalg versions xp-agnostic, it would only be in the cases where the user didn't pass arguments that made use of scipy.linalg-specific features. That will be complicated to document and add a good amount of mental complexity to someone trying to write or maintain xp-agnostic code.

I think my guiding principle for thinking about scipy.linalg and scipy.fft with respect to the array standard is that we should not care very much about the core computational routines being xp-agnostic themselves (simple helper routines like fftshift() are another story). Their main purpose is to expose the performance and options of the compiled C/C++/Fortran code, and that can't be xp-agnostic. The main thing to think about is how to unblock the xp-agnosticism of other parts of scipy that happen to use scipy.linalg and scipy.fft. So I think both options are valid ways to do that, each with some drawbacks and benefits.

I will also note that my desire to remove things like if 'cupy' in xp.__name__ clauses is not a hard line if there's appetite for handling such cases by other devs (as #19001 proves). I acknowledge that for at least some amount of time, such switches may be necessary to actually achieve practical amounts of non-ndarray support in the near term. And if we just let their absence block us, it might be the case that we never surface the gaps in the Array API or individual implementations that need to be found for them to be fixed. But I definitely don't want them sprinkled all around scipy.signal and the rest. If they were constrained to shim/utility functions in scipy._lib.array_api or other such namespaces, I'd be more likely to tolerate them. I'd still probably try to minimize them and be more willing to accept that if the xp implementation doesn't support some things well enough, then they'll just hit a "convert-to-ndarray" codepath or fail (I'm thinking of the as_strided() fallback codepath in #19001, here).

@rgommers
Copy link
Member

I do think that a separate namespace of xp.linalg-compliant shim functions that do The Dance might be a reasonable option. I was going to suggest it as well. In this scenario, we leave all of the scipy.linalg functions alone; they only do ndarray calculations on the CPU just as normal.

The former may be worth considering I think, especially if it helps make a step in the direction of the long-standing issue of diverging behavior of scipy.linalg and numpy.linalg (e.g., for qr). However, I'm not sure about that part yet - it may just be extra API surface without fixing the existing divergence. Maybe it can be done but kept private, only for use by other SciPy submodules that may need it? I'd like to comment on the last sentence above though - also related to this comment:

I think my guiding principle for thinking about scipy.linalg and scipy.fft with respect to the array standard is that we should not care very much about the core computational routines being xp-agnostic themselves (simple helper routines like fftshift() are another story). Their main purpose is to expose the performance and options of the compiled C/C++/Fortran code, and that can't be xp-agnostic.

I don't really agree here. The key element of the proposed design (see gh-18286) is this sentence: The basic design principle I propose aiming for in all of SciPy for array libraries is: container type in == container type out. The reasoning here seems to not take that into account. linalg and fft are far from the only components in SciPy that contain a lot of compiled code, and I don't see why they should be the two special-case modules in the big picture (that there's array API extensions that overlap with scipy.fft and scipy.linalg functionality is a possible extra opportunity, that's all).

The design has several goals/consequences:

  1. Correctness (e.g. no silently wrong answers for masked arrays, unit array or other such inputs, or pandas dataframe/series input that may behave unpredictably)
  2. Composability (even without performance differences, it is a big gain to be able to use SciPy APIs in an array type-agnostic way from downstream libraries and end user code).
  3. Performance (GPU execution, etc.)

The tentative argument here against changing scipy.linalg functions seems to only consider (3), Performance. I think some of this, in particular around (1), Correctness, is pretty well covered in "Problems & Opportunities", so I'd much prefer to avoid discussing a deviation from that design for the linalg module only. Unless there's a really compelling reason of course - but I don't see it yet.

For (2), Composability, let me also point out a scikit-learn example: https://github.com/scikit-learn/scikit-learn/blob/80c39a4966ee88150d7e5849ab21a8e17bb1fbc8/sklearn/decomposition/_pca.py#L545-L555. They'd like scipy.linalg.svd for numpy arrays, and the xp version otherwise. That if-else dance works best if it lives inside scipy.linalg.svd rather than in scikit-learn and other users of scipy.linalg.svd.

I will also note that my desire to remove things like if 'cupy' in xp.__name__ clauses [...]

Agreed with the spirit of this part - we should minimize if-statements like this as much as we can. There'll be some cases where it's a tradeoff though, and a few lines more code for 10x or more performance difference (as in the as_strided case) should be okay. But we can't have that all over the place indeed.

@rkern
Copy link
Member

rkern commented Aug 20, 2023

Fair. I came into the discussion late, and the point of discussion at the time was whether or not the xp.linalg versions got used inside of the scipy.linalg equivalents, which was essentially a question about (3). On the assumption that we want "container type in == container type out" for all of our functions, then yeah, it would be a good idea for scipy.linalg.* to call the relevant xp.linalg.* when it's applicable and available.

That said, for the rest of scipy that uses scipy.linalg, it is going to be a little hard to know when calling scipy.linalg.* not only preserves the container type but also keeps the computation xp-native and avoids the conversion based on the options passed in. I think it might still behoove us to have xp.linalg API-compatible versions that use scipy.linalg implementations when given ndarrays and xp.linalg implementations otherwise. Then other parts of scipy that are really concerned about keeping everything xp-native can use that and be sure that we're not going through a "convert to ndarray" codepath. I'm happy with it being scipy-internal, though it might be useful for other projects that already have scipy as a dependency, so maybe data-apis/array-api-compat would be a good place for it, too.

@rgommers
Copy link
Member

Over the past few weeks a lot of improvements in test infra were implemented, and @lucascolley's PR (gh-19005) for support in fft was merged, as was @mdhaber's PR (gh-19023) which made a start on special with 14 of the most popular functions. So I think it's time to carefully move ahead on linalg.

linalg is a large module, so it'd be best to not have one giant PR for everything. Tackling first the functions that are also in the standard would be useful I think - @lucascolley gave a useful categorization (A-C) in the issue description. And the private "guaranteed to not convert to numpy.ndarray" as discussed in the comments above is a fourth category.

@lucascolley I know there's a few fft follow-ups to do still, but I think it'd be useful to see a PR with what you have for linalg so far, after updating it for any "best practices" from the fft/cluster/special work if needed.

I'm happy with it being scipy-internal, though it might be useful for other projects that already have scipy as a dependency, so maybe data-apis/array-api-compat would be a good place for it, too.

I do believe that there may be a case for a standalone library for this kind of thing. array-api-compat is scoped pretty tightly as a compat layer for APIs that are either in the latest standard but not yet implemented in libraries, or for things that have been discussed/accepted for inclusion in the standard and are pending inclusion. I think both SciPy and scikit-learn are slowly going to accumulate private versions of array-agnostic functions that are built on top of the standard rather than included in it (cov is a good example here, see gh-19051) and those could fairly easily be made reusable. I think it's a bit early for a separate package right now, but it may materialize over the next 6-12 months.

@lucascolley
Copy link
Member Author

lucascolley commented Sep 19, 2023

Thanks Robert, Ilhan, Ralf and Tyler for the discussion here! It looks like we've already answered some important questions, not least from my end.

In tandem with opening my WIP for linalg, I would like to briefly justify the scope of my PR here. Unfortunately, I will not have time to convert all of linalg this summer (my internship lasts until the end of September), so I would also like to set out a rough plan of how this PR will fit into the bigger picture of full module conversion.

my guess would have been that we'd try to improve the performance of our existing linalg surface area by supporting the array API under the hood, but expanding our API surface area to match the actual array backends is a bit less clear as the right direction to me

To clarify on this point, I am no longer considering any API additions.


I am currently writing a blog post which will describe the different changes required to convert a submodule. Testing aside, the changes are:

  1. For pure Python + NumPy functions, rewrite everything to be array-agnostic. This was the case with the Fast Hankel Transforms in fft, and much of cluster.

  2. When compiled code is hit, use np.asarray before it and xp.asarray after it. This will raise exceptions for GPU arrays to avoid silent device transfers. This was the case for the Discrete Sine and Cosine Transforms in fft.

  3. For submodules which have a Standard Extension Module, check whether it exists in the namespace before falling back to (2). if hasattr(xp, 'linalg'):. NumPy arrays, or all arrays when the experimental flag is not set, will continue through to the current SciPy implementation instead. This was the case for the basic functions in fft.

My PR will tackle the functions covered by (3). Future PRs will tackle (1) and (2).

After or alongside this work, separate PRs could implement calling out to specific libraries like CuPy and PyTorch where (2) would otherwise be hit. This is what Matt has done for some functions in special in gh-19023. This is a temporary solution in lack of a general dispatch structure, but arguably just as important as the array-agnostic work in the short term.

the private "guaranteed to not convert to numpy.ndarray" as discussed in the comments above

This is another type of work which can happen alongside what is above. Since this is just SciPy internal, I don't think that we need to figure this out fully before starting work on the other parts.


I hope that this helps to clarify things! I would recommend saving any specific implementation questions for my PR, but feel free to ask anything about the vision for conversion which I have laid out here.

@lucascolley
Copy link
Member Author

I am currently writing a blog post which will describe the different changes required to convert a submodule.

This can be found at https://labs.quansight.org/blog/scipy-array-api.

@lucascolley lucascolley self-assigned this Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array types Items related to array API support and input array validation (see gh-18286) scipy.linalg
Projects
None yet
Development

No branches or pull requests

5 participants