Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add out-of-tree Pyodide builds in CI for scikit-image #7350

Open
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

agriyakhetarpal
Copy link

@agriyakhetarpal agriyakhetarpal commented Mar 18, 2024

Description

This pull request is meant to supersede #7265 based on @lagru's suggestion, and it adds a CI job to build and test scikit-image in a Pyodide virtual environment through WASM wheels. Following this, it would be possible to include JupyterLite notebooks in the documentation in order to run scikit-image's code snippets which come with docstring-based examples.

It is to be highlighted that this was completed last month for NumPy: (numpy/numpy#25894) and for PyWavelets – an optional dependency for scikit-image: (PyWavelets/pywt#701), all thanks to @rgommers for the help.

I was working on this last month, and I have rebased my branch to account for the latest changes upstream.

cc @stefanv from discussions on the previous PR.

Checklist

Release note

We use changelist to
compile each pull request into an item of the release notes. Please refer to
the instructions
and past release notes
for guidance and examples.

...

lagru and others added 10 commits March 18, 2024 18:22
Locally, this version seems to successfully create a wheel when running
"pyodide build" opposed to using "python -m build ...".
Based on numpy/numpy#25894. This comment updates versions for Pyodide, Emscripten, and improves some names for job steps. Multiple reusable actions have also been updated in accordance with the latest versions available.

Co-Authored-By: Lars Grüter <20140352+lagru@users.noreply.github.com>
This commit loads and re-saves all of the `.npy` files to `.npz` NumPy compressed file format and saves them to the 'data' field in the Npz objects.

The test cases where they have been used have been updated to use this attribute, and the SHA-256 checksums in the registry for each of the files have been updated in accordance with this change.
@agriyakhetarpal
Copy link
Author

agriyakhetarpal commented Mar 18, 2024

For context: a lot of the .npy files were replaced with the compressed .npz file format in 0dc7431 – please see pyodide/pyodide#4541 for more information. This fix has not been released yet, but I am happy to drop the commit once we do have a release.

Some of the changes will make the tests with additional data being downloaded fail with pooch, I assume. The failure from the Emscripten job retains the s_cmp unresolved symbol error – my next step here will be to investigate a different testing environment using Node.js instead of a regular Pyodide-based Python interepreter (similar to what is being used by scikit-learn at the time of writing).

@lagru lagru added the 🤖 type: Infrastructure CI, packaging, tools and automation label Mar 19, 2024
@agriyakhetarpal
Copy link
Author

Some progress: adding a JS script to run the pytest test suite has made it work, and all the tests proceed to run till 100% completion. I will now work to resolve the failures and errors faced in the test cases.

@agriyakhetarpal agriyakhetarpal force-pushed the add-emscripten-builds-out-of-tree branch from ac6d92d to 12f0bd4 Compare March 19, 2024 12:16
@agriyakhetarpal agriyakhetarpal force-pushed the add-emscripten-builds-out-of-tree branch 2 times, most recently from 970d0a3 to da1c1ed Compare March 19, 2024 18:54
@agriyakhetarpal
Copy link
Author

More progress: I have finally got a green check (🎉) for the Emscripten job by skipping and fixing tests as necessary: https://github.com/scikit-image/scikit-image/actions/runs/8348607683/job/22850925405?pr=7350.

I will now proceed to clean up this PR and fix the rest of the tests. It might require fixing the .npy to .npz conversion, i.e., reverting the changes made in 0dc7431 because it breaks the pooch tests, but I will investigate that if for now I can do this with a patch (I had tried to do the same thing for PyWavelets earlier but the necessary patches for it were a bit complicated to apply – but I have no qualms in trying that again). This PR is almost there and should be ready for review soon!

Try setting Agg backend for Matplotlib testing

Don't mark Matplotlib tests, correct set backend
@agriyakhetarpal agriyakhetarpal force-pushed the add-emscripten-builds-out-of-tree branch from da1c1ed to 00fefee Compare March 20, 2024 14:09
@agriyakhetarpal agriyakhetarpal force-pushed the add-emscripten-builds-out-of-tree branch from 00fefee to 24598b7 Compare March 20, 2024 14:18
@agriyakhetarpal
Copy link
Author

agriyakhetarpal commented Mar 20, 2024

Looks like all tests are now passing! This is ready for review now, @lagru and @rgommers. I am happy to rewrite history here if needed, or have this be squash-merged – either way works for me.

As soon as this is merged, I can follow up with a PR that publishes these wheels weekly (based on the current cadence) to the Anaconda.org repository or any other PyPI-like indices1. Doing that should be much more trivial in comparison to this PR :) I can also add a corresponding job in the test_nightlies_on_main.yml workflow if that is required, but I don't think we have nightly WASM wheels for a lot of repositories right now, so this can wait for a while – I would prefer to do that in a separate PR anyway!

Note: the commit 0dc7431 that performed the .npy to .npz conversion has been reverted, which should make this PR a bit cleaner. I have applied a patch to the Emscripten toolchain files to fix the bug (duly attributed to the Pyodide core developers).

Footnotes

  1. it might make sense to make this a reusable workflow, since that way it can be called from a file (or I can modify the new workflow to include the upload step here itself).

@agriyakhetarpal agriyakhetarpal changed the title [WIP]: Add out-of-tree Pyodide builds in CI for scikit-image Add out-of-tree Pyodide builds in CI for scikit-image Mar 20, 2024
@agriyakhetarpal agriyakhetarpal marked this pull request as ready for review March 20, 2024 16:00
Copy link
Author

@agriyakhetarpal agriyakhetarpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments to facilitate an easier review, below:

skimage/_shared/testing.py Outdated Show resolved Hide resolved
Comment on lines +71 to +73
from matplotlib import use

use('Agg')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with the MPLBACKEND environment variable set inside the workflow, it didn't work and the tests for Matplotlib were failing – I would suggest keeping this change here (and in other places).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't have as much context as @lagru here, but for what it's worth: in SciPy we set the backend to Agg for testing, and it's my understanding that that in general is a robust strategy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really follow why use('Agg') fixes things and MPLBACKEND="Agg" doesn't. E.g. for be525a4, pytest's error log already indicates that Agg is used.

This is before you try to use Agg programmatically

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand it enough either (perhaps Node.js is not properly letting environment variables pass to the Python interpreter). But sure, let me try to revert this change in a separate commit, again, at the end of addressing the review and we can see if things break in an isolated state. Either way, keeping just one of these options would be better.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done a bit of reading about this as someone who does not have much JavaScript experience, and I think what is happening here is that Pyodide runs Python code within the browser environment (or in a different environment if a browser is not present), isolated from the Node.js process that spawns the Pyodide interpreter – therefore, it doesn't directly inherit environment variables from Node.js.
It could be tricky to debug further but I'm happy to do so, so I'll remove the MPLBACKEND environment variable from the workflow file for now, given that the programmatic approach with matplotlib.use will be beneficial for not only these Pyodide-specific tests, but for the general test suite on regular supported platforms too. Edit: removed in 174f3e7.

skimage/io/tests/test_io.py Show resolved Hide resolved
tools/emscripten/patches/0001-fix-npy-file-access.patch Outdated Show resolved Hide resolved
@lagru lagru self-requested a review March 20, 2024 19:01
Copy link

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite good to me. The CI job config looks good, as do the logs. Applying the needed patch for .npy files is a good approach.

The changes to test files all seem reasonable; the import threading one is the only one that may require a change, but you already commented on that so the maintainers can decide on that one.

Comment on lines +71 to +73
from matplotlib import use

use('Agg')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't have as much context as @lagru here, but for what it's worth: in SciPy we set the backend to Agg for testing, and it's my understanding that that in general is a robust strategy.

See pyodide/pyodide#4654 for discussion. This commit resolves a build error due
to a release of `build` (v1.2.0) which had broken the
`pyodide build` command for all repositories.
@agriyakhetarpal
Copy link
Author

Pyodide had a recent patch release to fix failures on the pyodide build command, I updated it in d02105b and merged the latest changes from the main branch (haven't rebased so as not to spoil the review above, but that can always be done later as needed). The Emscripten toolchain version has not changed with this release, so the ABI remains stable.

@lagru
Copy link
Member

lagru commented Apr 11, 2024

Sorry for not keeping up with this @agriyakhetarpal. 🙏 I plan to review this by the weekend.

@agriyakhetarpal
Copy link
Author

Thanks, @lagru! No worries at all. I was merging main a few times over recently to ensure that the jobs are not failing with upstream changes. Plus, it would only ease the review, I imagine.

skimage/_shared/testing.py Outdated Show resolved Hide resolved
.github/workflows/emscripten.yml Show resolved Hide resolved
skimage/io/tests/test_io.py Show resolved Hide resolved
tools/emscripten/patches/0001-fix-npy-file-access.patch Outdated Show resolved Hide resolved
Copy link
Member

@lagru lagru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @agriyakhetarpal! I'm impressed by the workarounds you figured out by yourself without knowing our library intimately. :D

skimage/feature/_basic_features.py Outdated Show resolved Hide resolved
Comment on lines +71 to +73
from matplotlib import use

use('Agg')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really follow why use('Agg') fixes things and MPLBACKEND="Agg" doesn't. E.g. for be525a4, pytest's error log already indicates that Agg is used.

This is before you try to use Agg programmatically

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖 type: Infrastructure CI, packaging, tools and automation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants