-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test nightly wheel build with NumPy 2.0 #7288
Conversation
matplotlib pulls in contourpy and contourpy pins numpy<2. So for now, remove matplotlib as a testing dependency and assert that NumPy 2.0 is used during the tests with for the nightly wheels.
... a little silly of me ^^.
perhaps this is passed to the test environment as well?
Unfortunately we relied on NumPy's lookfor which is promptly removed in NumPy 2.0. Technically, if we don't want to make a breaking release we need to deprecate the function in a deprecation cycle for which we need to vendor NumPy's lookfor. Though, skimage.lookfor is an interactive function, so we might get away with not vendoring NumPy's lookfor, and just making it return a deprecation warning...
instead of deprecated `sctype2char`. Also np.core is flat out deprecated. Seems weird that I can just remove obj2sctype in _convert while it still keeps working. See also numpy/numpy#25580.
import sys | ||
|
||
from .._shared.utils import deprecate_func | ||
from .._vendored.numpy_lookfor import lookfor as _lookfor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highlighting this. Unfortunately we relied on NumPy's lookfor which is promptly removed in NumPy 2.0. Technically, if we don't want to make a breaking release we need to deprecate the function in a deprecation cycle for which we need to vendor NumPy's lookfor.
Though, skimage.lookfor is an interactive function, so we might get away with not vendoring NumPy's lookfor, and just making it return a deprecation warning...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If its annoying, I guess we could also vendor it back. Does anyone still use lookfor? I feel as an interactive function, it would be a few old-school power users probably (I never used it), thus the feeling we probably can get away with (and no actual program can fail).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an alternative for "find a function with x in the docstring"? I guess you'd need the sphinx docs, but those aren't always there on the airplane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, I am not sure if some IDE has search features, which would be nice. It still seems odd in NumPy to me and probably very rarely used, but if you prefer keeping it for now, I am fine with that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most IDEs only take the function / method / class name into account when matching for a search string. Or you can do a full search that includes more than just docstrings. @seberg do you have an opinion on how much maintenance work this has been?
Of all discussed options, I prefer NumPy to just keep it around as that's the most simple option from our side. However, I'd totally get why you might want to remove lookfor
. In that case I'd vote for the scientific python package option or removing it all-together. I intend vendoring to be temporary solution only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure it has been maintanence work, so if there is a PR to restore it I won't mind merging it.
To me it was mostly about namespace clutter as I don't think typical user use it. IIRC it also returns some nonsense results (i.e. things that are much better in the html docs, but it doesn't find those, while it found docs that belonged just deleted).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it doesn't give me a lot of confidence that it's returning nonsense results. Still think it's better placed in a small stand-alone tool, so I probably won't make a PR adding it back to NumPy. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realizing @Carreau may also have thought about this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any particular solution.
I plan to have full documentation search with papyri at some point, we can even do some indexing. but it's not ready yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's keep it until the status quo changes.
As of NumPy 2.0, can_cast no longer supports Python scalars. I think `np.result_type` [1] still takes values into account when determining the resulting type because `np.min_scalar_type` is used under the hood. So we should be able to get away by comparing if the resulting type is the same as the target type `arr.dtype`. [1] https://numpy.org/devdocs/reference/generated/numpy.result_type.html
`np.finfo(dtype).eps` will return a scalar with the same `dtype`.
if alpha is None: | ||
alpha = alpha_max | ||
|
||
if not np.can_cast(alpha, arr.dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highlighting this bit. As of NumPy 2.0, np.can_cast
no longer supports Python scalars. I think np.result_type
still takes values into account when determining the resulting type because np.min_scalar_type
is used under the hood. So we should be able to get away by comparing if the resulting type is the same as the target type arr.dtype
.
@seberg, taking the liberty to Cc you here. 🙏 I'm sure I haven't grokked NEP 50 in its entirety yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that isn't correct, result types of course does not take the value into account.
It ignores the values (same as if it was 0) for Python complex/float/ints. For out-of-bound integers that would mean that the full
call below should fail if it is out of range, though. The result_type
I guess still rejects floats if the array is integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you seem to be right, thanks for the feedback! I think I was led astray by this bit from the docstring
Otherwise, min_scalar_type is called on each scalar, and the resulting data types are all combined with promote_types to produce the return value.
Concering np.full
, do I misunderstand you? It seems like np.full
happily overflows out of bounds integers. E.g.
np.full(1, -1, dtype=np.uint8)
# returns array([255], dtype=uint8)
np.full(1, 300, dtype=np.uint8)
# returns array([44], dtype=uint8)
Is there a an alternative solution in NumPy for what np.can_cast
formerly did here? I guess I could just check if the cast is equal to the original value (54a901f). I imagine the performance impact will be low as I expect alpha to be a scalar usally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also curious about np.min_scalar_type(np.iinfo(np.int64).max)
returning dtype('uint64')
. I would have expected np.int64
but I guess the promotion path reaches np.uint64
first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arg, you are right, full is weird np.array(value, dtype=dtype)
would work, or assignment, but full unfortunately does np.asarray(value)
and then assigns.
I wonder if it is worth refactoring |
Fixing this problem for NumPy 1 & 2 has been a bit tricky but I think the current solution should catch and deal with every numeric dtype that a user may use for `tolerance`.
skimage/morphology/_flood_fill.py
Outdated
tolerance = abs(tolerance) | ||
# Account for over- & underflow problems with seed_value ± tolerance | ||
# in a way that works with NumPy 1 & 2 | ||
min_value, max_value = numeric_dtype_min_max(seed_value.dtype) | ||
with np.errstate(over="raise", under="raise"): | ||
try: | ||
low_tol = max(min_value, seed_value - tolerance) | ||
except (OverflowError, FloatingPointError): | ||
low_tol = min_value | ||
try: | ||
high_tol = min(max_value, seed_value + tolerance) | ||
except (OverflowError, FloatingPointError): | ||
high_tol = max_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some explanation: in NumPy 2.0 we can no longer rely on
high_tol = min(max_value, seed_value + tolerance)
low_tol = max(min_value, seed_value - tolerance)
to correctly bound to the min and max of the dtype. seed_value
may be a np.uint8
in which case adding e.g. 379 to it will result in an OverflowError.
with np.errstate(over="raise", under="raise")
is necessary because in NumPy 2 cases like
low_tol = max(0, np.uint8(2) - 3)
now underflow to np.uint8(255)
because the 3 will be cast to np.uint8 before addtion. In NumPy 1, the right side would result in np.int64(-1).
Though, I may be wrong. I find it very tricky to think of and test for all edge cases...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but you will get a RuntimeWarning, which should make our test suite error out.
Easiest in this case may be max(0, int(x) - 3)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Casting to the appropriate Python scalar seems indeed like the easiest fix. 👍
Forgot to do this in a previous commit.
This option should have no bearing on the precision with which results are compared by `assert_array_almost_equal`. The parameter decimal=10 should be used for that purpose. This option also had unwanted side effects, because it changed the display global option globaly leading to problems in later doctest comparisons. Note that changing the comparisons in this test to use `assert_array_almost_equal(..., decimal=10)` fails the tests.
Previously the returned `unique_inverse` array was always flattened which seems to be no longer the case for NumPy 2. However, keep the output flattened as before. This superseeds commit 8e693ab which provided a fix that didn't work on NumPy<2.
For a correct dtype comparison, we must specify the "endianness" as well. Comparing with img.dtype.type == np.uint16 would also work. This updates commit 2e2deb9 which provided a fix that didn't work on NumPy<2.
Can someone tell my why scikit-image/skimage/util/dtype.py Line 33 in 903824f
is used instead of |
This function emits its own warning if the given alpha isn't compatible with the given image dtype. So ignore the one raised by NumPy.
My guess is it'd be fine to switch to int32. Maybe we can also update that confusing comment above |
I'm not sure what the original intention of this test was but Stéfan suggests that asserting np.uint16 is enough and I tend to concur.
I don't really follow. In #3043 (comment) it is explained that
But it seems that this is not true for NumPy 2 on AMD64 and Windows. According to the tests Furthermore, I don't get why the "underlying C types" are of relevance here. Additionally, while |
in _convert. These should be more explicit and should be more reliable in ensuring that for example int32 is available on all platforms.
df4e634
to
2617ab1
Compare
2617ab1
to
76f9018
Compare
Looks like that serves our purpose perfectly.
While these might be useful to define in one place to reuse in tests, keep the PR focused for now.
This reverts commit 6e39a2e.
@@ -26,16 +26,6 @@ Use the ``quick search`` field in the navigation bar of the online | |||
documentation to find mentions of keywords (segmentation, | |||
rescaling, denoising, etc.) in the documentation. | |||
|
|||
API Discovery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this taken out, if we still support it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now lookfor
is deprecated and its removal scheduled for 0.26. While that is so, I think it makes sense to remove this section. It's also in the TODO.txt.
To be honest, I forgot that it was marked as deprecated while the discussion in #7288 (comment) happened. I don't mind reverting the deprecation and this in a PR if you would like to see it stay...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. If we run into any issues, we can fix them in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should merge this and act on any further concerns as they arise. I did make a few comments, but let's discuss those for potential follow-up.
@@ -137,6 +137,9 @@ def try_all_threshold(image, figsize=(8, 5), verbose=True): | |||
|
|||
Examples | |||
-------- | |||
.. testsetup:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way to get these out of the docstrings and into a configuration file, e.g., if we used doctest++?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doctest-plus doesn't have an option to do so in a configuration file, I think.
See #7289 (comment) for background on the current pattern. Like @soupault I actually like the explicitness of this and didn't like that the other option using __doctest_skip__
is hiding that doctests are skipped.
Note that these directives don't show up in our rendered HTML docstrings.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add something like
needs_matplotlib = [
"draw/draw.py",
"filters/thresholding.py",
...
]
collect_ignore = []
try:
import matplotlib
except ImportError:
collect_ignore += needs_matplotlib
to skimage/conftest.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice not to have the docstrings littered with statements that make little sense to new readers (for whom these are most useful).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I see the value in that. .. testsetup
might appear often enough via help(some_func)
and using IPython's ?
that it's a relevant concern.
I'd support an alternative solution if it makes it visible that a test is skipped. Otherwise it might get really confusing for contributors. Imagine them not having installed an optional dependency and wondering why the CI is showing red for something they can't reproduce because the doctest is just silently skipped by some collect_ignore
somewhere they have no idea about.
I can't get the solution suggested by @jarrodmillman to work reliably, it seems like doctests are still collected. Furthermore, it and collect_ignore_glob
can only ignore entire files or directories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been using it for NetworkX reliably for several years. Can you describe how it is unreliable?
I don't think it has been confusing for contributors, but we have had to explain how to skip tests for some contributors, but I suspect that will be true of any solution.
It does ignore entire files or directories. That hasn't been an issue for NetworkX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe how it is unreliable?
Trying that out locally just doesn't seem to work. E.g. if I add "restoration/j_invariant.py",
to
scikit-image/skimage/conftest.py
Lines 5 to 6 in 9107d52
collect_ignore = [ | |
"io/_plugins", |
and run
spin test -- -v --doctest-plus skimage/restoration/j_invariant.py
it's still picked up. I'm guessing it's because I am somehow using the wrong pattern? I couldn't figure out one that works to skip the module.
Also in case of skimage/restoration/j_invariant.py
only the doctest of denoise_invariant
needs to be skipped if pywt
is not available. Skipping the entire module would skip two doctests unnecessarily.
If we can make skipping in doctestplus more visible, then I think we can satisfy everyone. I'm currently looking into it in scientific-python/pytest-doctestplus#246...
@@ -40,7 +40,9 @@ def test_overrange_tolerance_float(): | |||
image *= max_value | |||
|
|||
expected = np.ones_like(image) | |||
output = flood_fill(image, (0, 1), 1.0, tolerance=max_value * 10) | |||
with np.errstate(over="ignore"): | |||
tolerance = max_value * 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, is this safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I used the wrong approach to "fix" this as previously the result was using Python scalars while now it results in an np.float32(inf)
which is not the same as before. I'll address this in a follow-up PR. 👍
array([[-0.2178588368, 0.4192819131, -0.0343074756], | ||
[-0.0717941428, 0.0451643229, 0.0216072614], | ||
[ 0.2480621133, -0.4294781423, 0.0221019139]]) | ||
>>> tform_matrix.params # doctest: +FLOAT_CMP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, can't this be a global doctest option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's the option in setup.cfg
. Not fond of adding a setup.cfg
again just for this.
I think the +FLOAT_CMP
is actually not necessary. I added this before I discovered that another test was changing the precision for floating point comparisons. So I'll remove it in a follow-up PR.
Though, in cases like this I would still prefer the more explicit local option unless we have a reason to to not use the default precision globally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A global config should be sufficient, but fine with having exceptions defined this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness, saimn
pointed out in scientific-python/pytest-doctestplus#185 that pyproject.toml is supported since scientific-python/pytest-doctestplus#222
Thank you, @lagru, that was a big job! |
Description
Concerns #7282. matplotlib pulls in contourpy and contourpy pins numpy<2. So for now, remove matplotlib as a testing dependency and assert that NumPy 2.0 is used during the tests with for the nightly wheels.
I'm currently also piling on a few fixes that we will eventually need to be compatible with NumPy 2.0.
Modules passing locally for NumPy 1 & 2 (skipping some dependencies):
skimage._shared
skimage.color
skimage.data
skimage.draw
skimage.exposure
skimage.feature
skimage.filters
skimage.future
skimage.graph
skimage.io
skimage.measure
skimage.metrics
skimage.morphology
skimage.registration
skimage.restoration
skimage.segmentation
skimage.transform
skimage.util
References:
Checklist
./doc/examples
for new featuresRelease note
Summarize the introduced changes in the code block below in one or a few sentences. The
summary will be included in the next release notes automatically: