ENH: `stats._xp_mean`, an array API compatible `mean` with `weights` and `nan_policy` #20743

mdhaber · 2024-05-18T20:03:14Z

Reference issue

Toward gh-20544

What does this implement/fix?

This function adds _xp_mean, an array-API compatible function which combines the features of np.mean, np.average, and np.nanmean in interface that fits with scipy.stats. This will be needed for making functions like pmean, hmean, and gmean array-API compatible.

Additional information

~~Potential reviewers: would you be willing to write some unit tests with hypothesis? For such a fundamental function, it's particularly important that it works flawlessly!~~

If it doesn't sound too crazy, I'd suggest that this and similar var and std functions be added publicly to scipy.stats because they provide functionality that does not exist with the array API (e.g. weights, which has been explicitly rejected, and nan_policy, which has not been standardized and may not follow SciPy's convention). Even considering NumPy alone, it would be useful to have a single function that has all the functionality of mean, average, and nanmean in an interface consistent with the rest of scipy.stats.

Not pursuing these things right now. Let's just get this in so we can finish the other mean functions.

scipy/_lib/_array_api.py

scipy/_lib/tests/test_array_api.py

[skip ci]

mdhaber · 2024-05-18T23:33:21Z

scipy/stats/tests/test_axis_nan_policy.py

+    (xp_mean_1samp, tuple(), dict(), 1, 1, False, lambda x: (x,)),
+    (xp_mean_2samp, tuple(), dict(), 2, 1, True, lambda x: (x,)),


Most scipy.stats functions use the _axis_nan_policy decorator to implement nan_policy, keepdims, and tuple axis. I've implemented all these features natively for improved performance (e.g. nan_policy='omit' would otherwise loop over each slice), and the function still passes all the tests, which are quite stringent. So if you don't want to write tests with hypothesis, I'm still pretty comfortable with this.

mdhaber · 2024-05-18T23:43:33Z

scipy/_lib/_array_api.py

+
+    if weights is not None and x.shape != weights.shape:
+        try:
+            x, weights = xp.broadcast_arrays(x, weights)


A few thoughts about broadcasting:

Technically x = [1, 2, 3] is broadcastable with weights = [2], and it can be interpreted as giving all observations a weight of 2.

Technically, x = [1] is broadcastable with weights = [1, 2, 3]: now we have x being broadcast to the shape of weights rather than the (more natural) other way around.

Technically x = [] is broadcastable with weights = [1]: weights gets broadcasted to shape (0,), and the weighted mean is NaN.

It's clearly simpler to just accept these sorts of things, but since they're not useful, one could argue that we shouldn't. I'd propose that we just accept them, but if there are strong opinions about not accepting them, LMK.

rgommers · 2024-05-19T08:46:25Z

scipy/_lib/_array_api.py

@@ -475,3 +476,155 @@ def xp_sign(x, xp=None):
    sign = xp.where(x < 0, -one, sign)
    sign = xp.where(x == 0, 0*one, sign)
    return sign
+
+
+def xp_add_reduced_axes(res, axis, initial_shape, *, xp=None):


Could you add a note on why this is needed? Is it temporary, why can't xp.add not be used, etc.?

Type annotations and consistency with other functions in this file would be useful too (at least if you expect this function to stay around for a while).

res should preferably be positional-only.

Perhaps a better name would have been xp_replace_reduced_axes or xp_keepdims: it adds back axes that have been reduced away. However, when there are other comments to respond to, I'll just move the logic back into xp_mean, since I'm not sure if it will be used elsewhere. It can be factored out again as needed. Although the comment wasn't about xp_mean, I can make the first argument of xp_mean positional-only.

fancidev · 2024-05-19T22:11:39Z

Why was weights explicitly rejected for the Array API? Would you by chance have a link or something for the discussion back then?

lucascolley · 2024-05-19T22:38:36Z

Why was weights explicitly rejected for the Array API? Would you by chance have a link or something for the discussion back then?

data-apis/array-api#366

fancidev · 2024-05-19T22:59:23Z

Thanks for the link @lucascolley .

To align with the naming convention of hmean, pmean, and gmean, would it be more appropriate to call the function amean (a for arithmetic)?

…sitional-only

ENH: add xp_mean for mean with weights and nan_policy

13d919c

mdhaber added scipy.stats enhancement A new feature or improvement array types Items related to array API support and input array validation (see gh-18286) labels May 18, 2024

github-actions bot added the scipy._lib label May 18, 2024

mdhaber commented May 18, 2024

View reviewed changes

mdhaber added 3 commits May 18, 2024 17:23

Apply suggestions from code review

f547a6b

[skip ci]

MAINT: xp_mean: remaining revisions

dbe161d

TST: xp_mean: strengthen tests

7352500

mdhaber marked this pull request as ready for review May 18, 2024 23:28

mdhaber commented May 18, 2024

View reviewed changes

rgommers reviewed May 19, 2024

View reviewed changes

mdhaber mentioned this pull request May 19, 2024

ENH: stats: add array API-support #20544

Open

74 tasks

lucascolley changed the title ~~ENH: xp_mean: an array-API compatible mean with weights and nan_policy~~ ENH: xp_mean, an array API compatible mean with weights and nan_policy Jun 2, 2024

mdhaber added 5 commits June 3, 2024 23:49

Merge remote-tracking branch 'upstream/main' into xp_mean

76dc796

MAINT: xp_mean: move keepdims logic into functions; make first-arg po…

c4dc66c

…sitional-only

TST: xp_mean: fix failing test about too small warning

52d7d11

MAINT: xp_mean: use _broadcast_arrays instead of xp.broadcast_arrays

714694c

MAINT: stats._xp_mean: move _xp_mean

f942854

mdhaber changed the title ~~ENH: xp_mean, an array API compatible mean with weights and nan_policy~~ ENH: stats._xp_mean, an array API compatible mean with weights and nan_policy Jun 9, 2024

mdhaber added 2 commits June 9, 2024 15:16

Merge remote-tracking branch 'upstream/main' into xp_mean

4750599

MAINT: stats._xp_mean: match _axis_nan_policy behavior

eb529f6

mdhaber requested review from lucascolley and j-bowhay June 9, 2024 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: `stats._xp_mean`, an array API compatible `mean` with `weights` and `nan_policy` #20743

ENH: `stats._xp_mean`, an array API compatible `mean` with `weights` and `nan_policy` #20743

mdhaber commented May 18, 2024 •

edited

mdhaber May 18, 2024

mdhaber May 18, 2024

rgommers May 19, 2024

rgommers May 19, 2024

mdhaber May 20, 2024 •

edited

fancidev commented May 19, 2024

lucascolley commented May 19, 2024

fancidev commented May 19, 2024

		(xp_mean_1samp, tuple(), dict(), 1, 1, False, lambda x: (x,)),
		(xp_mean_2samp, tuple(), dict(), 2, 1, True, lambda x: (x,)),

ENH: stats._xp_mean, an array API compatible mean with weights and nan_policy #20743

Are you sure you want to change the base?

ENH: stats._xp_mean, an array API compatible mean with weights and nan_policy #20743

Conversation

mdhaber commented May 18, 2024 • edited

Reference issue

What does this implement/fix?

Additional information

mdhaber May 18, 2024

Choose a reason for hiding this comment

mdhaber May 18, 2024

Choose a reason for hiding this comment

rgommers May 19, 2024

Choose a reason for hiding this comment

rgommers May 19, 2024

Choose a reason for hiding this comment

mdhaber May 20, 2024 • edited

Choose a reason for hiding this comment

fancidev commented May 19, 2024

lucascolley commented May 19, 2024

fancidev commented May 19, 2024

ENH: `stats._xp_mean`, an array API compatible `mean` with `weights` and `nan_policy` #20743

ENH: `stats._xp_mean`, an array API compatible `mean` with `weights` and `nan_policy` #20743

mdhaber commented May 18, 2024 •

edited

mdhaber May 20, 2024 •

edited