Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: stats.moment: add array API support #20292

Merged
merged 19 commits into from
Apr 19, 2024
Merged

Conversation

mdhaber
Copy link
Contributor

@mdhaber mdhaber commented Mar 20, 2024

Reference issue

Towards gh-18867

What does this implement/fix?

Adds array API support to stats.moment.

@mdhaber mdhaber added enhancement A new feature or improvement array types Items related to array API support and input array validation (see gh-18286) labels Mar 20, 2024
scipy/stats/_stats_py.py Outdated Show resolved Hide resolved
@lucascolley
Copy link
Member

Is there something I need to do to get this to run with alternative backends in CI?

Add a new pytest call in the array API CI workflow. -s stats might be overkill for now, maybe you can make it a little more fine grained with -t.

Copy link
Member

@lucascolley lucascolley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few comments for now

scipy/_lib/_util.py Outdated Show resolved Hide resolved
scipy/_lib/tests/test__util.py Outdated Show resolved Hide resolved
scipy/_lib/_util.py Outdated Show resolved Hide resolved
scipy/stats/_axis_nan_policy.py Outdated Show resolved Hide resolved
scipy/stats/_stats_py.py Outdated Show resolved Hide resolved
scipy/stats/tests/test_axis_nan_policy.py Outdated Show resolved Hide resolved
@mdhaber
Copy link
Contributor Author

mdhaber commented Mar 28, 2024

@lucascolley Thanks for the quick reviews - I'm glad you're doing that! Just know that you will find a lot of things if you get to it before I am able to review my own code on GitHub. Seeing just the diff highlights these things, which I might not be so careful about when drafting the code.

Re: importing SCIPY_ARRAY_API - I think I have a legitimate need for skipping tests whenever SCIPY_ARRAY_API == 1: the tests involve masked and/or object arrays, but array_namespace raises errors when it sees these. Therefore, tests involving masked and/or object arrays typically should be skipped when SCIPY_ARRAY_API == 1 because they will fail regardless of backend, right?

I think this will be common enough that it does make sense to have a custom mark like the original skip_if_array_api. Otherwise, as you see in the CI logs, I have to pass a reason when using pytest.mark.skipif, and that is going to get really repetitive when the reason is always the same. Does it make sense to have a real skip_if_array_mark after all?

scipy/_lib/_util.py Outdated Show resolved Hide resolved
scipy/_lib/_util.py Outdated Show resolved Hide resolved
@lucascolley
Copy link
Member

lucascolley commented Mar 28, 2024

I think this will be common enough that it does make sense to have a custom mark like the original skip_if_array_api. Otherwise, as you see in the CI logs, I have to pass a reason when using pytest.mark.skipif, and that is going to get really repetitive when the reason is always the same. Does it make sense to have a real skip_if_array_mark after all?

@rgommers you originally wanted this marker removed, but I think it is the best way to deal with tests that involve masked/object/numpy.matrix arrays. Thoughts?

@rgommers
Copy link
Member

@rgommers you originally wanted this marker removed, but I think it is the best way to deal with tests that involve masked/object/numpy.matrix arrays. Thoughts?

I'm thinking about what will happen if and when we make the behavior proposed in gh-18286 the default. If something is simply skipped now, it may silently stop being tested or be blocking for changing the default behavior. I see that I already anticipated the issue with masked array support in stats.mstats in gh-18286, see the entry for numpy.ma.MaskedArray in the first table there. That suggests:

  • stats.mstats functions should return masked arrays when they receive masked arrays as inputs
  • stats functions should reject masked arrays
  • object arrays are never accepted, start raising TypeError
  • numpy.matrix instances are never accepted, start raising TypeError

For the current behavior, yes it seems like there should be a way to skip tests that don't comply with the above. Ideally in a way that makes it easy to move over to the new defaults without anything going wrong silently.

I have to pass a reason when using pytest.mark.skipif, and that is going to get really repetitive when the reason is always the same

skip_if_array_api isn't really a reason, so I'm slightly worried about using that. The array API code path already raises a hard error, so how are we going to add deprecations for behavior that you'd want to skip with skip_if_array_api? It seems to me like we'd need separate code paths inside the functions for "doesn't work with array-api defaults", and that it'd be useful to have more specific skip markers like accepts_object_arrays, accepts_npmatrix. Does that make sense here?

@mdhaber
Copy link
Contributor Author

mdhaber commented Mar 28, 2024

The array API code path already raises a hard error, so how are we going to add deprecations for behavior that you'd want to skip with skip_if_array_api?

I think I'm not seeing the problem. Suppose there were a decorator or mark skip_perfect_name_here that skips tests when SCIPY_ARRAY_API=1. This would be applied to tests that pass array API incompatible input to the function being tested. That is, it would be applied to tests that would otherwise fail because array_namespace raises an error (whether that be due to receiving an object array, a NumPy matrix, a masked array, or something else).

Immediately, when the function is called with array API incompatible input:

  • the function runs normally when SCIPY_ARRAY_API is not specified1, so the test continues to do its job.
  • the function raises an error when SCIPY_ARRAY_API=1, but the test is skipped.

Sometime before the behavior proposed in gh-18286 becomes the default, functions that currently accept array API invalid input need to begin emitting deprecation warnings when they receive array API invalid input. This can be a separate effort from adding the array API functionality, and it can be done after adding array API functionality. To be honest, array_namespace could begin to emit a deprecation warning when it receives array API incompatible input and SCIPY_ARRAY_API is not set2.

At this point, when the function is called with array API incompatible input:

  • the function emits a deprecation warning when SCIPY_ARRAY_API is not specified, causing tests marked with skip_perfect_name_here to fail. We can either manually update the tests at that time to look for (and not fail due to) the deprecation warning, or maybe skip_perfect_name_here can be updated to do that automatically.
  • the function raises an error when SCIPY_ARRAY_API=1, but it will still be skipped.

When we decide to pull the trigger and make the behavior proposed in gh-18286 (or something similar) the default and only behavior, we remove the decorator. Tests to which the decorator has been applied will begin to fail, but they are no longer relevant, so we can remove them. If there is something in these tests that is still relevant, we move those parts into separate tests. Maybe that should be done when the decorator is applied as I've already done here with the _contains_nan tests.

I don't necessarily mean for this to be a real proposal, but hopefully it's explicit enough about a potential path forward so we can identify problems with having a skip_perfect_name_here decorator or mark.

Footnotes

  1. IIRC SCIPY_ARRAY_API=0 doesn't work right now

  2. One potential pitfal would be that it would also emit a deprecation warning for functions that do not currently work with array API invalid input, suggesting that the do but won't in the future. I don't think this is a problem provided that the warning message is worded appropriately.

@lucascolley
Copy link
Member

I think what Matt has written makes sense.

To be honest, array_namespace could begin to emit the deprecation warning when it receives array API incompatible input and SCIPY_ARRAY_API is not set.

+1, I think we can word the warnings appropriately to avoid the concern in your footnote.

scipy/_lib/tests/test__util.py Show resolved Hide resolved
scipy/stats/_axis_nan_policy.py Show resolved Hide resolved
scipy/stats/_stats_py.py Show resolved Hide resolved
scipy/stats/_stats_py.py Show resolved Hide resolved
Comment on lines +1045 to +1046
if xp.isdtype(a.dtype, 'integral'):
a = xp.asarray(a, dtype=xp.float64)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When skew, kurtosis, and any other functions that rely on _moment are updated, this can be removed, since it's already in the public function.

@rgommers
Copy link
Member

I don't necessarily mean for this to be a real proposal, but hopefully it's explicit enough about a potential path forward so we can identify problems with having a skip_perfect_name_here decorator or mark.

Thanks Matt for thinking it through. I agree, that should work as a general strategy.

The proposed name should work, but some hints as to why would be helpful, at least in the first usage in a set of functions. If we don't want specific decorators, then something like:

@skip_if_array_api   # uses masked arrays

it's so much easier than reading the code to deduce what the reason for a skip is

@mdhaber
Copy link
Contributor Author

mdhaber commented Mar 29, 2024

Thanks @rgommers @lucascolley. Does the reason need to be more specific than that the tested function is passed an array API invalid input, or do all the different types of invalid inputs used need to be specified?

If the former, would skip_array_api_invalid_arg without comment work, if this plan is documented in the contributor guide?

And just to double check, should this be a decorator that applies a mark so that there is potential for the decorator to do more in the future (e.g. look for deprecation warnings, if that's possible)?

@rgommers
Copy link
Member

If the former, would skip_array_api_invalid_arg without comment work, if this plan is documented in the contributor guide?

That sounds good to me. In most cases it'll be clear enough with the test code right below. In case it's non-obvious (e.g. function generates object array internally), an extra comment as to why may be helpful.

And just to double check, should this be a decorator that applies a mark so that there is potential for the decorator to do more in the future (e.g. look for deprecation warnings, if that's possible)?

I think so - that's the most efficient way of doing it anyway, right?

@mdhaber
Copy link
Contributor Author

mdhaber commented Mar 29, 2024

Definitely. The only question is whether we would want to be more fine grained when looking for deprecation warnings. Some tests will have multiple calls to the function, and the decorator would not distinguish between the deprecation warning being emitted once or multiple times. But I think it would be safe enough to assume that all of them emit the warning if one of them does, especially if the warning is actually emitted by something central like array_namespace.

@rgommers
Copy link
Member

Agreed, that seems safe enough to assume.

inexact = (xp.isdtype(a.dtype, "real floating")
or xp.isdtype(a.dtype, "complex floating"))
if inexact:
# The summation method avoids creating another (potentially huge) array
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, it can introduce NaNs where non existed before (see gh-20386). I propose getting rid of it and always use the other method: contains_nan = xp.isnan(xp.sum(a)).

@mdhaber
Copy link
Contributor Author

mdhaber commented Apr 8, 2024

@lucascolley besides incorporating a fix for gh-20386, is there anything else you'd like to see here?

Copy link
Member

@lucascolley lucascolley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost there!

scipy/conftest.py Outdated Show resolved Hide resolved
scipy/conftest.py Outdated Show resolved Hide resolved
scipy/stats/_stats_py.py Outdated Show resolved Hide resolved
scipy/stats/_stats_py.py Outdated Show resolved Hide resolved
doc/source/dev/api-dev/array_api.rst Outdated Show resolved Hide resolved
Comment on lines +194 to +195
error_type = TypeError if SCIPY_ARRAY_API else ValueError
assert_raises(error_type, stats.shapiro, np.array([[], [2]], dtype=object))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I think this is the right pattern to use


from scipy._lib._array_api import xp_assert_equal
from scipy._lib._array_api import xp_assert_equal, is_numpy, copy as xp_copy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to changing the names of the helpers themselves to xp_copy and xp_size if that would be easier than having these aliases everywhere. size is just exposed from array_api_compat (no wrapper), and copy was written by Andrew.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not push for it across the board, but it doesn't sound like a bad Idea.

scipy/_lib/tests/test__util.py Show resolved Hide resolved
scipy/stats/tests/test_stats.py Outdated Show resolved Hide resolved
scipy/stats/tests/test_stats.py Outdated Show resolved Hide resolved
@mdhaber
Copy link
Contributor Author

mdhaber commented Apr 10, 2024

OK, I think the last commit addresses the comments. I will change #20292 (review) in a separate PR once this is merged, as that issue really is orthogonal.

Copy link
Member

@lucascolley lucascolley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good from my POV, thanks Matt!

@lucascolley lucascolley added this to the 1.14.0 milestone Apr 10, 2024
@mdhaber
Copy link
Contributor Author

mdhaber commented Apr 10, 2024

Thanks very much @lucascolley!
@tupui @AtsushiSakai do you have a moment to skim from a stats perspective?

Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great 👍 The implementation looks correct, same for the tests and the changes for Array API helpful and Lucas checked that part thoroughly 🙌

Feel free to go ahead @lucascolley

@lucascolley
Copy link
Member

thanks @tupui , @mdhaber !

@lucascolley lucascolley merged commit 546e1e5 into scipy:main Apr 19, 2024
29 checks passed
@mdhaber
Copy link
Contributor Author

mdhaber commented Apr 19, 2024

Thanks @lucascolley @rgommers @tupui!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array types Items related to array API support and input array validation (see gh-18286) enhancement A new feature or improvement scipy._lib scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants