ENH: stats.moment: add array API support #20292

mdhaber · 2024-03-20T01:13:10Z

Reference issue

Towards gh-18867

What does this implement/fix?

Adds array API support to stats.moment.

scipy/stats/_stats_py.py

[skip ci]

lucascolley · 2024-03-20T09:50:31Z

Is there something I need to do to get this to run with alternative backends in CI?

Add a new pytest call in the array API CI workflow. -s stats might be overkill for now, maybe you can make it a little more fine grained with -t.

lucascolley

just a few comments for now

scipy/_lib/_util.py

scipy/_lib/tests/test__util.py

scipy/_lib/_util.py

scipy/stats/_axis_nan_policy.py

scipy/stats/_stats_py.py

scipy/stats/tests/test_axis_nan_policy.py

mdhaber · 2024-03-28T02:28:49Z

@lucascolley Thanks for the quick reviews - I'm glad you're doing that! Just know that you will find a lot of things if you get to it before I am able to review my own code on GitHub. Seeing just the diff highlights these things, which I might not be so careful about when drafting the code.

Re: importing SCIPY_ARRAY_API - I think I have a legitimate need for skipping tests whenever SCIPY_ARRAY_API == 1: the tests involve masked and/or object arrays, but array_namespace raises errors when it sees these. Therefore, tests involving masked and/or object arrays typically should be skipped when SCIPY_ARRAY_API == 1 because they will fail regardless of backend, right?

I think this will be common enough that it does make sense to have a custom mark like the original skip_if_array_api. Otherwise, as you see in the CI logs, I have to pass a reason when using pytest.mark.skipif, and that is going to get really repetitive when the reason is always the same. Does it make sense to have a real skip_if_array_mark after all?

scipy/_lib/_util.py

scipy/_lib/tests/test__util.py

…mpy arrays

scipy/_lib/_util.py

lucascolley · 2024-03-28T09:15:14Z

I think this will be common enough that it does make sense to have a custom mark like the original skip_if_array_api. Otherwise, as you see in the CI logs, I have to pass a reason when using pytest.mark.skipif, and that is going to get really repetitive when the reason is always the same. Does it make sense to have a real skip_if_array_mark after all?

@rgommers you originally wanted this marker removed, but I think it is the best way to deal with tests that involve masked/object/numpy.matrix arrays. Thoughts?

rgommers · 2024-03-28T09:50:42Z

@rgommers you originally wanted this marker removed, but I think it is the best way to deal with tests that involve masked/object/numpy.matrix arrays. Thoughts?

I'm thinking about what will happen if and when we make the behavior proposed in gh-18286 the default. If something is simply skipped now, it may silently stop being tested or be blocking for changing the default behavior. I see that I already anticipated the issue with masked array support in stats.mstats in gh-18286, see the entry for numpy.ma.MaskedArray in the first table there. That suggests:

stats.mstats functions should return masked arrays when they receive masked arrays as inputs
stats functions should reject masked arrays
object arrays are never accepted, start raising TypeError
numpy.matrix instances are never accepted, start raising TypeError

For the current behavior, yes it seems like there should be a way to skip tests that don't comply with the above. Ideally in a way that makes it easy to move over to the new defaults without anything going wrong silently.

I have to pass a reason when using pytest.mark.skipif, and that is going to get really repetitive when the reason is always the same

skip_if_array_api isn't really a reason, so I'm slightly worried about using that. The array API code path already raises a hard error, so how are we going to add deprecations for behavior that you'd want to skip with skip_if_array_api? It seems to me like we'd need separate code paths inside the functions for "doesn't work with array-api defaults", and that it'd be useful to have more specific skip markers like accepts_object_arrays, accepts_npmatrix. Does that make sense here?

mdhaber · 2024-03-28T19:35:06Z

The array API code path already raises a hard error, so how are we going to add deprecations for behavior that you'd want to skip with skip_if_array_api?

I think I'm not seeing the problem. Suppose there were a decorator or mark skip_perfect_name_here that skips tests when SCIPY_ARRAY_API=1. This would be applied to tests that pass array API incompatible input to the function being tested. That is, it would be applied to tests that would otherwise fail because array_namespace raises an error (whether that be due to receiving an object array, a NumPy matrix, a masked array, or something else).

Immediately, when the function is called with array API incompatible input:

the function runs normally when SCIPY_ARRAY_API is not specified¹, so the test continues to do its job.
the function raises an error when SCIPY_ARRAY_API=1, but the test is skipped.

Sometime before the behavior proposed in gh-18286 becomes the default, functions that currently accept array API invalid input need to begin emitting deprecation warnings when they receive array API invalid input. This can be a separate effort from adding the array API functionality, and it can be done after adding array API functionality. To be honest, array_namespace could begin to emit a deprecation warning when it receives array API incompatible input and SCIPY_ARRAY_API is not set².

At this point, when the function is called with array API incompatible input:

the function emits a deprecation warning when SCIPY_ARRAY_API is not specified, causing tests marked with skip_perfect_name_here to fail. We can either manually update the tests at that time to look for (and not fail due to) the deprecation warning, or maybe skip_perfect_name_here can be updated to do that automatically.
the function raises an error when SCIPY_ARRAY_API=1, but it will still be skipped.

When we decide to pull the trigger and make the behavior proposed in gh-18286 (or something similar) the default and only behavior, we remove the decorator. Tests to which the decorator has been applied will begin to fail, but they are no longer relevant, so we can remove them. If there is something in these tests that is still relevant, we move those parts into separate tests. Maybe that should be done when the decorator is applied as I've already done here with the _contains_nan tests.

I don't necessarily mean for this to be a real proposal, but hopefully it's explicit enough about a potential path forward so we can identify problems with having a skip_perfect_name_here decorator or mark.

IIRC SCIPY_ARRAY_API=0 doesn't work right now ↩
One potential pitfal would be that it would also emit a deprecation warning for functions that do not currently work with array API invalid input, suggesting that the do but won't in the future. I don't think this is a problem provided that the warning message is worded appropriately. ↩

lucascolley · 2024-03-28T19:46:00Z

I think what Matt has written makes sense.

To be honest, array_namespace could begin to emit the deprecation warning when it receives array API incompatible input and SCIPY_ARRAY_API is not set.

+1, I think we can word the warnings appropriately to avoid the concern in your footnote.

[skip cirrus] [skip circle]

scipy/_lib/tests/test__util.py

scipy/stats/_axis_nan_policy.py

scipy/stats/_stats_py.py

mdhaber · 2024-03-29T04:53:04Z

scipy/stats/_stats_py.py

+    if xp.isdtype(a.dtype, 'integral'):
+        a = xp.asarray(a, dtype=xp.float64)


When skew, kurtosis, and any other functions that rely on _moment are updated, this can be removed, since it's already in the public function.

rgommers · 2024-03-29T10:13:09Z

I don't necessarily mean for this to be a real proposal, but hopefully it's explicit enough about a potential path forward so we can identify problems with having a skip_perfect_name_here decorator or mark.

Thanks Matt for thinking it through. I agree, that should work as a general strategy.

The proposed name should work, but some hints as to why would be helpful, at least in the first usage in a set of functions. If we don't want specific decorators, then something like:

@skip_if_array_api   # uses masked arrays

it's so much easier than reading the code to deduce what the reason for a skip is

mdhaber · 2024-03-29T15:41:18Z

Thanks @rgommers @lucascolley. Does the reason need to be more specific than that the tested function is passed an array API invalid input, or do all the different types of invalid inputs used need to be specified?

If the former, would skip_array_api_invalid_arg without comment work, if this plan is documented in the contributor guide?

And just to double check, should this be a decorator that applies a mark so that there is potential for the decorator to do more in the future (e.g. look for deprecation warnings, if that's possible)?

rgommers · 2024-03-29T15:44:24Z

If the former, would skip_array_api_invalid_arg without comment work, if this plan is documented in the contributor guide?

That sounds good to me. In most cases it'll be clear enough with the test code right below. In case it's non-obvious (e.g. function generates object array internally), an extra comment as to why may be helpful.

And just to double check, should this be a decorator that applies a mark so that there is potential for the decorator to do more in the future (e.g. look for deprecation warnings, if that's possible)?

I think so - that's the most efficient way of doing it anyway, right?

mdhaber · 2024-03-29T15:58:29Z

Definitely. The only question is whether we would want to be more fine grained when looking for deprecation warnings. Some tests will have multiple calls to the function, and the decorator would not distinguish between the deprecation warning being emitted once or multiple times. But I think it would be safe enough to assume that all of them emit the warning if one of them does, especially if the warning is actually emitted by something central like array_namespace.

rgommers · 2024-03-29T19:35:54Z

Agreed, that seems safe enough to assume.

mdhaber · 2024-04-03T19:59:20Z

scipy/_lib/_util.py

+    inexact = (xp.isdtype(a.dtype, "real floating")
+               or xp.isdtype(a.dtype, "complex floating"))
+    if inexact:
+        # The summation method avoids creating another (potentially huge) array


However, it can introduce NaNs where non existed before (see gh-20386). I propose getting rid of it and always use the other method: contains_nan = xp.isnan(xp.sum(a)).

mdhaber · 2024-04-08T11:36:10Z

@lucascolley besides incorporating a fix for gh-20386, is there anything else you'd like to see here?

lucascolley

almost there!

scipy/conftest.py

scipy/stats/_stats_py.py

doc/source/dev/api-dev/array_api.rst

lucascolley · 2024-04-08T11:49:53Z

scipy/stats/tests/test_morestats.py

+        error_type = TypeError if SCIPY_ARRAY_API else ValueError
+        assert_raises(error_type, stats.shapiro, np.array([[], [2]], dtype=object))


👍 I think this is the right pattern to use

lucascolley · 2024-04-08T11:51:29Z

scipy/_lib/tests/test__util.py


-from scipy._lib._array_api import xp_assert_equal
+from scipy._lib._array_api import xp_assert_equal, is_numpy, copy as xp_copy


I'm not opposed to changing the names of the helpers themselves to xp_copy and xp_size if that would be easier than having these aliases everywhere. size is just exposed from array_api_compat (no wrapper), and copy was written by Andrew.

I would not push for it across the board, but it doesn't sound like a bad Idea.

scipy/_lib/tests/test__util.py

scipy/stats/tests/test_stats.py

mdhaber · 2024-04-10T15:51:07Z

OK, I think the last commit addresses the comments. I will change #20292 (review) in a separate PR once this is merged, as that issue really is orthogonal.

lucascolley

looks good from my POV, thanks Matt!

mdhaber · 2024-04-10T18:09:34Z

Thanks very much @lucascolley!
@tupui @AtsushiSakai do you have a moment to skim from a stats perspective?

tupui

This looks great 👍 The implementation looks correct, same for the tests and the changes for Array API helpful and Lucas checked that part thoroughly 🙌

Feel free to go ahead @lucascolley

lucascolley · 2024-04-19T19:48:12Z

thanks @tupui , @mdhaber !

mdhaber · 2024-04-19T20:18:06Z

Thanks @lucascolley @rgommers @tupui!

ENH: stats.moment: add array API support

516232b

github-actions bot added scipy.stats scipy._lib labels Mar 20, 2024

mdhaber added enhancement A new feature or improvement array types Items related to array API support and input array validation (see gh-18286) labels Mar 20, 2024

mdhaber commented Mar 20, 2024

View reviewed changes

scipy/stats/_stats_py.py Outdated Show resolved Hide resolved

MAINT: stats._moment: fix bugs

d1724bf

[skip ci]

lucascolley reviewed Mar 20, 2024

View reviewed changes

scipy/_lib/_util.py Outdated Show resolved Hide resolved

scipy/_lib/tests/test__util.py Outdated Show resolved Hide resolved

mdhaber added 6 commits March 26, 2024 23:22

Merge remote-tracking branch 'upstream/main' into moment_array_api

0464823

MAINT: stats._moment: improve array API compatibility

b0a5825

TST: stats._moment: run tests on CI, skip failing tests

c423d42

MAINT:stats.moment: rename variable 'moment' to 'order' throughout

cc2fece

ENH: stats.moment: add array API support

1b884af

MAINT: stats._axis_nan_policy: skip when using non-NumPy backend

478eccf

lucascolley reviewed Mar 28, 2024

View reviewed changes

scipy/_lib/_util.py Outdated Show resolved Hide resolved

scipy/stats/_axis_nan_policy.py Outdated Show resolved Hide resolved

scipy/stats/_stats_py.py Outdated Show resolved Hide resolved

scipy/stats/tests/test_axis_nan_policy.py Outdated Show resolved Hide resolved

TST: stats.moment: begin to convert existing tests to array API

0beadb3

mdhaber commented Mar 28, 2024

View reviewed changes

scipy/_lib/_util.py Outdated Show resolved Hide resolved

mdhaber commented Mar 28, 2024

View reviewed changes

scipy/_lib/tests/test__util.py Outdated Show resolved Hide resolved

mdhaber added 2 commits March 27, 2024 19:56

TST: scipy._lib._contains_nan: fixup

48d8b0f

MAINT: stats._axis_nan_policy: fixup decorator early exit with non-nu…

358a4ad

…mpy arrays

mdhaber commented Mar 28, 2024

View reviewed changes

scipy/_lib/_util.py Outdated Show resolved Hide resolved

mdhaber added 2 commits March 28, 2024 14:38

TST: stats.moment: make tests array_api_compatible

8e7fc9d

TST: stats: adjust some skipped tests

b657fcb

[skip cirrus] [skip circle]

mdhaber commented Mar 29, 2024

View reviewed changes

mdhaber requested review from larsoner and andyfaff as code owners March 29, 2024 04:56

mdhaber added 4 commits March 29, 2024 13:51

Merge remote-tracking branch 'upstream/main' into moment_array_api

4a325b8

TST: define skip_array_api_invalid_arg decorator

f7c1d2d

DOC: document skip_array_api_invalid_arg decorator

52cd5cd

MAINT: stats.moment: minor fixups

00dc26c

mdhaber mentioned this pull request Apr 3, 2024

BUG: scipy.stats.kstest returns NaN starting in scipy 1.12 #20386

Closed

mdhaber commented Apr 3, 2024

View reviewed changes

lucascolley requested changes Apr 8, 2024

View reviewed changes

mdhaber mentioned this pull request Apr 9, 2024

BUG: SCIPY_ARRAY_API=0 results in TypeError in conftest.py #20429

Closed

MAINT: stats.moment: adustments per review

0a1870e

lucascolley approved these changes Apr 10, 2024

View reviewed changes

lucascolley added this to the 1.14.0 milestone Apr 10, 2024

Merge branch 'main' into moment_array_api

d8baff0

mdhaber mentioned this pull request Apr 14, 2024

MAINT: stats._contains_nan: fix bug when -inf and inf are in sample #20467

Merged

mdhaber requested a review from AtsushiSakai April 19, 2024 01:32

tupui approved these changes Apr 19, 2024

View reviewed changes

lucascolley merged commit 546e1e5 into scipy:main Apr 19, 2024
29 checks passed

This was referenced Apr 20, 2024

ENH: stats.skew: add array-API support #20541

Merged

ENH: stats: add array API-support #20544

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: stats.moment: add array API support #20292

ENH: stats.moment: add array API support #20292

mdhaber commented Mar 20, 2024 •

edited

lucascolley commented Mar 20, 2024

lucascolley left a comment

mdhaber commented Mar 28, 2024 •

edited

lucascolley commented Mar 28, 2024 •

edited

rgommers commented Mar 28, 2024

mdhaber commented Mar 28, 2024 •

edited

lucascolley commented Mar 28, 2024

mdhaber Mar 29, 2024

rgommers commented Mar 29, 2024

mdhaber commented Mar 29, 2024 •

edited

rgommers commented Mar 29, 2024

mdhaber commented Mar 29, 2024 •

edited

rgommers commented Mar 29, 2024

mdhaber Apr 3, 2024

mdhaber commented Apr 8, 2024

lucascolley left a comment

lucascolley Apr 8, 2024

lucascolley Apr 8, 2024

mdhaber Apr 8, 2024

mdhaber commented Apr 10, 2024 •

edited

lucascolley left a comment

mdhaber commented Apr 10, 2024 •

edited

tupui left a comment

lucascolley commented Apr 19, 2024

mdhaber commented Apr 19, 2024

		if xp.isdtype(a.dtype, 'integral'):
		a = xp.asarray(a, dtype=xp.float64)

		error_type = TypeError if SCIPY_ARRAY_API else ValueError
		assert_raises(error_type, stats.shapiro, np.array([[], [2]], dtype=object))


		from scipy._lib._array_api import xp_assert_equal
		from scipy._lib._array_api import xp_assert_equal, is_numpy, copy as xp_copy

ENH: stats.moment: add array API support #20292

ENH: stats.moment: add array API support #20292

Conversation

mdhaber commented Mar 20, 2024 • edited

Reference issue

What does this implement/fix?

lucascolley commented Mar 20, 2024

lucascolley left a comment

Choose a reason for hiding this comment

mdhaber commented Mar 28, 2024 • edited

lucascolley commented Mar 28, 2024 • edited

rgommers commented Mar 28, 2024

mdhaber commented Mar 28, 2024 • edited

Footnotes

lucascolley commented Mar 28, 2024

mdhaber Mar 29, 2024

Choose a reason for hiding this comment

rgommers commented Mar 29, 2024

mdhaber commented Mar 29, 2024 • edited

rgommers commented Mar 29, 2024

mdhaber commented Mar 29, 2024 • edited

rgommers commented Mar 29, 2024

mdhaber Apr 3, 2024

Choose a reason for hiding this comment

mdhaber commented Apr 8, 2024

lucascolley left a comment

Choose a reason for hiding this comment

lucascolley Apr 8, 2024

Choose a reason for hiding this comment

lucascolley Apr 8, 2024

Choose a reason for hiding this comment

mdhaber Apr 8, 2024

Choose a reason for hiding this comment

mdhaber commented Apr 10, 2024 • edited

lucascolley left a comment

Choose a reason for hiding this comment

mdhaber commented Apr 10, 2024 • edited

tupui left a comment

Choose a reason for hiding this comment

lucascolley commented Apr 19, 2024

mdhaber commented Apr 19, 2024

mdhaber commented Mar 20, 2024 •

edited

mdhaber commented Mar 28, 2024 •

edited

lucascolley commented Mar 28, 2024 •

edited

mdhaber commented Mar 28, 2024 •

edited

mdhaber commented Mar 29, 2024 •

edited

mdhaber commented Mar 29, 2024 •

edited

mdhaber commented Apr 10, 2024 •

edited

mdhaber commented Apr 10, 2024 •

edited