Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: fft: support array API standard #19005

Merged
merged 151 commits into from
Sep 19, 2023
Merged

Conversation

lucascolley
Copy link
Member

@lucascolley lucascolley commented Aug 2, 2023

For context, this work is part of my internship at Quansight Labs, which runs until the end of September.

Reference issue

RFC for array API support in SciPy: #18286
Tracker for array API support in SciPy: #18867
Merged PR adding machinery and covering scipy.cluster: #18668

What does this implement/fix?

This PR extends the array API support of #18668 to also cover scipy.fft.

fft is an array API standard extension module and has matching APIs in both CuPy and PyTorch.

For more context on the standard and the goal of SciPy adoption, please see the RFC.

For explanation of the machinery of which this PR makes use, please see the cluster PR.

Changes:

  • The uarray dispatch structure of _basic.py has been moved to a new file, _basic_uarray.py, which is used for NumPy arrays and array-likes. All input arrays are now validated by array_namespace, so all valid input arrays either support the array API standard or are converted to NumPy arrays. The functions of _basic.py fall into two categories in their handling of non-NumPy arrays:
  1. Those which are part of the standard first raise exceptions if unsupported parameters are provided. They then check whether the array's namespace has an fft module. If it does, then that is used. If not, then a conversion to NumPy is attempted in order to use uarray. If successful, the result is converted back to the same namespace as the input array.
  2. For those which are not part of the standard, a conversion to NumPy is attempted in order to use uarray. If successful, the result is converted back to the same namespace as the input array. This means that GPU support is restricted to functions which are part of the standard for now, as otherwise they are hitting compiled code made for NumPy arrays. This design decision is explained in the RFC:
RFC excerpt (dropdown)

When a non-NumPy array type sees compiled code in SciPy (which tends to use the NumPy C API), we have a couple of options:

  1. dispatch back to the other library (PyTorch, CuPy, etc.).

  2. convert to a NumPy array when possible (i.e., on CPU via the buffer protocol, DLPack, or array), use the compiled code in question, then convert back.

  3. don't support this at all

I'll note that (1) is the long-term goal; how to implement this is out of scope for this proposal - for more on that, see the Appendix section. For now we choose to do (2) when possible, and (3) otherwise. Switching from that approach to (1) in the future will be backwards-compatible.

  • The same design has been applied to _realtransforms.py and _realtransforms_uarray.py, save for the fact that there are no standard functions in these files.
  • fftfreq, rfftfreq, fftshift and ifftshift were previously just imported from numpy. They now have implementations in _helper.py, and follow the pattern for standard functions.
  • The NumPy backend of _fftlog.py has been moved to a new file, _fftlog_np.py. Its uarray dispatch structure remains in the file _fftlog_multimethods.py. The functions in _fftlog.py now follow the pattern for non-standard functions.
  • Tests for fftfreq etc. have been copied over from NumPy to test_helper.py.
  • Tests have been modified to be array API compatible where possible.
  • TestNamespaces has been added to test_basic.py and test_helper.py to check that output arrays are of the same namespace as input arrays.
  • test_non_standard_params has been added to test_basic.py to check that exceptions are raised for unsupported parameters.
  • A new helper has been added to conftest.py: set_assert_allclose(xp), which returns an assert_allclose equivalent for the namespace xp (currently supports cupy and torch), to allow testing on GPU.
  • Two new helpers have been added to _array_api.py: is_numpy(xp) which checks whether xp.__name__ is "scipy._lib.array_api_compat.array_api_compat.numpy", and _assert_matching_namespace by @tylerjereddy .

Benchmark:

Here is a basic benchmark, which shows that the task of smoothing an image (taking a 2-D image, using scipy.fftn, zero-ing out the highest frequency components, and using scipy.ifftn) using a CuPy array is ~15x faster than using a NumPy array. This is with an AMD Ryzen 5 2600 and an NVIDIA GeForce GTX 1060 6GB:

Benchmark (dropdown)
In [1]: import scipy

In [2]: import numpy as np

In [3]: face = scipy.datasets.face()

In [4]: %%timeit
   ...: f = scipy.fft.fftn(face)
   ...: fshift = scipy.fft.fftshift(f)
   ...: original = np.copy(fshift)
   ...: fshift[354:414, 482:532] = 0
   ...: f_ishift = scipy.fft.ifftshift(original - fshift)
   ...: x = scipy.fft.ifftn(f_ishift)
   ...:
   ...:
292 ms ± 20 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [5]: import cupy as cp

In [6]: cpface = cp.asarray(face)

In [7]: from cupyx.profiler import benchmark

In [8]: def myfunc():
   ...:     f = scipy.fft.fftn(cpface)
   ...:     fshift = scipy.fft.fftshift(f)
   ...:     original = cp.copy(fshift)
   ...:     fshift[354:414, 482:532] = 0
   ...:     f_ishift = scipy.fft.ifftshift(original - fshift)
   ...:     x = scipy.fft.ifftn(f_ishift)
   ...:

In [9]: print(benchmark(myfunc, n_repeat=7))
myfunc              :    CPU:  1184.414 us   +/- 129.675 (min:   983.994 / max:  1332.594) us     GPU-0: 13652.553 us   +/- 53.933 (min: 13529.088 / max: 13699.072) us

Additional information

Difficulties:

  • TestFFTFreq and TestRFFTFreq fail for PyTorch GPU due to a bug in PyTorch, which has now been fixed on main: torch.asarray does not respect set_default_device pytorch/pytorch#106773.
  • test_basic.py::test_multiprocess currently only tests numpy since it hangs for any array API library (even numpy.array_api). Generally, multithreading tests (including those using the unsupported workers keyword in test_multithreading.py) are the same, with test_basic.py::TestFFTThreadSafe only being skipped on GPU.

Implementation Details:

  • Any tests of non-standard functions (which involve attempted conversion to NumPy upon reaching compiled code) are skipped on GPU backends.
  • The signatures for fftfreq and rfftfreq have changed to (n, d=1.0, *, xp=None, device=None). The addition of device is to match the signature in the standard. The addition of xp is to specify the namespace for the output array. This is required since these are "array-generating functions" which output an array without taking one as input. Since array-api-compat and numpy.array_api are yet to implement fft, these functions and their tests are currently worked-around or skipped.
    Example of workaround for numpy. These implementations - alongside suitable documentation changes - will close fftfreq uses numpy module as example #13887.
  • test_basic.py::test_fft_with_order only tests numpy since orders are not supported in the array API. Likewise, test_identity_1d_overwrite and test_identity_nd_overwrite in test_real_transforms.py only test numpy since the overwrite_x keyword is not supported for other libraries.
  • Some of the tests in test_basic.py::TestFFT1D and test_helper.py::TestFFTShift are failing on torch backend due to differences in the function signatures, but they will pass once array-api-compat has implemented torch.fft. Currently @skip_if_array_api_backend('torch') is used.

Many thanks to @tupui, @izaid and @rgommers who have given really helpful guidance in the making of this PR!

References:

lucascolley and others added 30 commits July 18, 2023 17:09
adds test utility for checking that namespaces match

Co-authored-by: Tyler Reddy <tyler.je.reddy@gmail.com>
@rgommers
Copy link
Member

The CuPy uarray backend example is fixed now, and the stricter mock_backend.py looks good to me.

@peterbell10 I plan to do final check and then merging this soon. If you'd like to have a look at this, that would be very welcome. See http://scipy.github.io/devdocs/dev/api-dev/array_api.html for details on how to enable the array API support and test against PyTorch et al. (e.g. export SCIPY_ARRAY_API=1 && python dev.py test -s fft -b pytorch).

Copy link
Member

@peterbell10 peterbell10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small questions/comments. Only the _init_nd_shape_and_axes issue I would consider major.

scipy/fft/_basic_backend.py Outdated Show resolved Hide resolved
scipy/fft/_basic_backend.py Show resolved Hide resolved
scipy/fft/_basic_backend.py Show resolved Hide resolved
scipy/fft/_fftlog_backend.py Outdated Show resolved Hide resolved
xp = array_namespace(x)
x = np.asarray(x)
shape, axes = _helper._init_nd_shape_and_axes(x, shape, axes)
return xp.asarray(shape), xp.asarray(axes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns a list, it doesn't belong in an array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, how would you recommend changing this? I was following the standard pattern for attempting to convert to np and back to xp when hitting compiled code, but sounds like we want something a bit different here.

Do the docs need to be changed? Under Returns we have shape : array and axes : array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this private helper function need changes at all? It's used like so:

shape, axes = _init_nd_shape_and_axes(x, ...)

so it should be fine to keep the return types as they were? shape and axes can be tuples at least; lists as they are now isn't guaranteed to work with most functions that have shape=. So if something is choking on a list, maybe try to return tuples instead.

The docs don't need to change as far as I can tell.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterbell10 @rgommers is this the desired change? aed9f82

Edit: also removed the array_namespace call in dda7057

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. I can see where the confusion was coming from though, because the docs of the function were incorrect to begin with - those said the return values were arrays. Instead, shape was a list and axes was a range object. Both aren't ideal either - shape should be a tuple and axes a list.

In figuring that out, I have most of a fix ready, so let me see if I can push that here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in commit 14cd806. and I also pushed a commit to include fft in the array API CI job.

scipy/fft/_helper.py Outdated Show resolved Hide resolved
scipy/fft/_helper.py Show resolved Hide resolved


# torch.fft not yet implemented by array-api-compat
@skip_if_array_api_backend('torch')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a testing infra issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no, this one is a bit complicated. torch's n-D transforms use the dim keyword instead of the axes keyword as specified by the standard. These tests were failing during development because we were passing the argument as a keyword, however they are not failing anymore because we have switched to passing it as a positional arg. This comment indicates that array-api-compat will handle this aliasing when it implements torch.fft. (Side note: maybe this can be built into torch soon? I'm not sure whether torch is planning to comply with the fft standard extension too.)

@tylerjereddy, on a second look, I don't think that we should have moved from keyword args to positional args when we combined _execute_1D and _execute_nD, since the standard specifies them as keyword only (e.g. https://data-apis.org/array-api/2022.12/extensions/generated/array_api.fft.fftn.html#array_api.fft.fftn). Should we revert to the separate _execute*s to keep these kwargs?

If we do revert, then these skips should stay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we should have moved from keyword args to positional args

Can't you still get away with it via **kwargs instead of restoring the code duplication? Also, part of my original point was that the tests still passed--if this is really important a test should perhaps fail if we don't conform to it.

To be clear though, I was never convinced that SciPy itself should be a provider of the standard, just that it would support it internally to allow swapping backends. I think I made a similar comment for linalg discussion re: us being an algorithms library vs. array API provider library. I can understand why one might want to re: mirroring NumPy swaps perhaps, or if it reduces confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear though, I was never convinced that SciPy itself should be a provider of the standard, just that it would support it internally to allow swapping backends.

Yes I agree, but this is just about calling providers. If we want to be able to support any library which complies with the standard, then we need the kwargs. We don't need to accept kwargs only, but we do need to be able to call libraries which do.

Can't you still get away with it via **kwargs instead of restoring the code duplication?

Ah possibly, I'll try that 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so PyTorch allows dim arguments to be called with axis but not axes. It is a bit annoying that NumPy makes this distinction, especially as it isn't inconsistent. e.g. np.squeeze(x, axis=(1, 2)).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerjereddy is this what you had in mind? 5f68e8c

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can leave this to a follow-up if needed, but to me it looks like the cure is worse than the disease here. The use of **kwargs and multiple try-except clauses to avoid a bit of code duplicate between _execute and _execute_nD feels a bit off. @tylerjereddy WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new solution is probably still less lines, but I'm happy for you and Lucas to proceed with a revert-to-original if you prefer the original case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong preference from me. If I had to say, I'd go for the separate functions to avoid the try-excepts. Your call @rgommers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original is preferred - easier to read and more robust. @lucascolley can you please take this along in a follow-up PR?

scipy/fft/tests/test_fftlog.py Show resolved Hide resolved
scipy/fft/tests/test_helper.py Outdated Show resolved Hide resolved
Copy link
Member Author

@lucascolley lucascolley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Peter, thanks for the review! Hopefully my comments can explain most of the points raised here. They've helped to catch a potential bug so thanks a lot.

I'm not entirely sure what changes are needed for the _init_nd_shape_and_axes part, so if you could explain a little more that would be greatly appreciated!

scipy/fft/_basic_backend.py Show resolved Hide resolved
scipy/fft/_fftlog_backend.py Outdated Show resolved Hide resolved
scipy/fft/_helper.py Show resolved Hide resolved
scipy/fft/tests/test_fftlog.py Show resolved Hide resolved


# torch.fft not yet implemented by array-api-compat
@skip_if_array_api_backend('torch')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no, this one is a bit complicated. torch's n-D transforms use the dim keyword instead of the axes keyword as specified by the standard. These tests were failing during development because we were passing the argument as a keyword, however they are not failing anymore because we have switched to passing it as a positional arg. This comment indicates that array-api-compat will handle this aliasing when it implements torch.fft. (Side note: maybe this can be built into torch soon? I'm not sure whether torch is planning to comply with the fft standard extension too.)

@tylerjereddy, on a second look, I don't think that we should have moved from keyword args to positional args when we combined _execute_1D and _execute_nD, since the standard specifies them as keyword only (e.g. https://data-apis.org/array-api/2022.12/extensions/generated/array_api.fft.fftn.html#array_api.fft.fftn). Should we revert to the separate _execute*s to keep these kwargs?

If we do revert, then these skips should stay.

scipy/fft/_basic_backend.py Show resolved Hide resolved
xp = array_namespace(x)
x = np.asarray(x)
shape, axes = _helper._init_nd_shape_and_axes(x, shape, axes)
return xp.asarray(shape), xp.asarray(axes)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, how would you recommend changing this? I was following the standard pattern for attempting to convert to np and back to xp when hitting compiled code, but sounds like we want something a bit different here.

Do the docs need to be changed? Under Returns we have shape : array and axes : array.

scipy/fft/_basic_backend.py Outdated Show resolved Hide resolved
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, all remaining issues were resolved, it seems to work great with all backends, and this PR also contains a few test improvements for fft with numpy arrays now. All reviewers' seem happy, so let's give this a go.

Great work Lucas, thanks a lot! And thanks to all reviewers as well!

@rgommers rgommers merged commit b786038 into scipy:main Sep 19, 2023
22 checks passed
@izaid
Copy link
Contributor

izaid commented Sep 19, 2023

Well done @lucascolley! Really great to see this merged.

@tupui
Copy link
Member

tupui commented Sep 19, 2023

Awesome work @lucascolley 🙌🚀 congratulations on getting this in!

joanrue added a commit to joanrue/dask that referenced this pull request Dec 4, 2023
scipy.fftpack is now legacy as it doesn't follow numpy.fft API and they suggest using scipy.fft scipy/scipy#19005
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array types Items related to array API support and input array validation (see gh-18286) scipy.fft scipy._lib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fftfreq uses numpy module as example
7 participants