Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: stats.wilcoxon: fix failure with multidimensional x with NaN and slice length > 50 #20592

Merged
merged 1 commit into from May 16, 2024

Conversation

mdhaber
Copy link
Contributor

@mdhaber mdhaber commented Apr 27, 2024

Reference issue

Closes gh-20591

What does this implement/fix?

This fixes a bug in which under specific conditions (x.ndim > 1, x.shape[axis] > 50, np.isnan(x).any(), nan_policy='propagate', and method='auto', for instance) wilcoxon could fail and raise an unexpected error.

@mdhaber mdhaber added scipy.stats maintenance Items related to regular maintenance tasks labels Apr 27, 2024
@mdhaber
Copy link
Contributor Author

mdhaber commented May 6, 2024

@tirthasheshpatel This is a pretty short fix; do you have a moment to take a look before we branch 1.13.1?

@mdhaber mdhaber added this to the 1.13.1 milestone May 6, 2024
@mdhaber mdhaber added the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label May 6, 2024
Copy link
Member

@tirthasheshpatel tirthasheshpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix!

Just to confirm: Passing each slice individually would return z depending on which method it branches to but the behavior is unchanged for vectorized (multi-dimensional) inputs, right?

@mdhaber
Copy link
Contributor Author

mdhaber commented May 15, 2024

Passing each slice individually would return z depending on which method it branches to...

Er... no, or that's what happened in 1.13. This fixes that.

import numpy as np
import scipy
from scipy import stats
x = np.arange(1, 100)
scipy.__version__  # 1.12.0
# branches to `approx`, but does not return `zstatistic`
hasattr(stats.wilcoxon(x, method='auto'), 'zstatistic')  # False

In SciPy 1.13.0/main:

import numpy as np
import scipy
from scipy import stats
x = np.arange(1, 100)
scipy.__version__  # 1.13.0
# branches to `approx` and returns `zstatistic` - this is the problem
hasattr(stats.wilcoxon(x, method='auto'), 'zstatistic')  # **True**

Because method='auto' is allowed to add zstatistic to the result object, NaNs cause problems. For example, with nan_policy='omit' in 1.13:

rng = np.random.default_rng(25893459825282452)
x = rng.random((2, 55))
x[1, :10] = np.nan
# Slice is length 55 > 50, so method='approx' is used, and `zstatistic` is added
hasattr(stats.wilcoxon(x[0], method='auto', nan_policy='omit'), 'zstatistic')  # True
# Slice after removing NaNs is length 45 < 50, so method='exact' is used, and `zstatistic` is not added
hasattr(stats.wilcoxon(x[1], method='auto', nan_policy='omit'), 'zstatistic')  # False
# Error because one slice has three return values and the other has only two
stats.wilcoxon(x, axis=1, method='auto', nan_policy='omit')  # error

By ensuring that zstatistic is only added when method='approx' is specified explicitly, we ensure that either every slice has zstatistic or every slice does not have zstatistic, avoiding this problem.

@tirthasheshpatel
Copy link
Member

Oh, OK. That makes sense. Thanks for the clarification!

Tests look good too, so merging! Thanks!

@tirthasheshpatel tirthasheshpatel merged commit 224dea5 into scipy:main May 16, 2024
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-candidate This fix should be ported by a maintainer to previous SciPy versions. maintenance Items related to regular maintenance tasks scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: scipy.stats.wilcoxon in 1.13 fails on 2D array with nan -- 1D works
2 participants