Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dipy==1.9.0 is causing our macOS GitHub Actions runners to stall on unrelated torch tests #3125

Open
6 tasks
joshuacwnewton opened this issue Mar 15, 2024 · 3 comments

Comments

@joshuacwnewton
Copy link
Contributor

Description

This is a bit of a weird one. In a routine maintenance PR where we're upgrading the versions of Python packages installed into our virtual environment, we noticed that some tests were timing out for macOS runners. After a litany of debugging, we narrowed it down to dipy==1.9.0. (Holding all other packages at their same versions, dipy==1.8.0 passes but dipy==1.9.0 times out.)

The tests that stall appear to be ones related to torch-based deep leaning pipelines. For example, we have a deep learning-based registration model called algo='dl' that uses the packages voxelmorph/neurite/pystrum.

pipdeptree output for voxelmorph
voxelmorph==0.2
├── h5py [required: Any, installed: 3.10.0]
│   └── numpy [required: >=1.17.3, installed: 1.26.4]
├── neurite [required: >=0.2, installed: 0.2]
│   ├── matplotlib [required: Any, installed: 3.8.3]
│   │   ├── contourpy [required: >=1.0.1, installed: 1.2.0]
│   │   │   └── numpy [required: >=1.20,<2.0, installed: 1.26.4]
│   │   ├── cycler [required: >=0.10, installed: 0.12.1]
│   │   ├── fonttools [required: >=4.22.0, installed: 4.49.0]
│   │   ├── importlib-resources [required: >=3.2.0, installed: 6.3.0]
│   │   │   └── zipp [required: >=3.1.0, installed: 3.18.1]
│   │   ├── kiwisolver [required: >=1.3.1, installed: 1.4.5]
│   │   ├── numpy [required: >=1.21,<2, installed: 1.26.4]
│   │   ├── packaging [required: >=20.0, installed: 24.0]
│   │   ├── pillow [required: >=8, installed: 10.2.0]
│   │   ├── pyparsing [required: >=2.3.1, installed: 3.1.2]
│   │   └── python-dateutil [required: >=2.7, installed: 2.9.0.post0]
│   │       └── six [required: >=1.5, installed: 1.16.0]
│   ├── nibabel [required: Any, installed: 5.2.1]
│   │   ├── numpy [required: >=1.20, installed: 1.26.4]
│   │   └── packaging [required: >=17, installed: 24.0]
│   ├── numpy [required: >=1.17, installed: 1.26.4]
│   ├── packaging [required: Any, installed: 24.0]
│   ├── pystrum [required: >=0.2, installed: 0.4]
│   │   ├── matplotlib [required: Any, installed: 3.8.3]
│   │   │   ├── contourpy [required: >=1.0.1, installed: 1.2.0]
│   │   │   │   └── numpy [required: >=1.20,<2.0, installed: 1.26.4]
│   │   │   ├── cycler [required: >=0.10, installed: 0.12.1]
│   │   │   ├── fonttools [required: >=4.22.0, installed: 4.49.0]
│   │   │   ├── importlib-resources [required: >=3.2.0, installed: 6.3.0]
│   │   │   │   └── zipp [required: >=3.1.0, installed: 3.18.1]
│   │   │   ├── kiwisolver [required: >=1.3.1, installed: 1.4.5]
│   │   │   ├── numpy [required: >=1.21,<2, installed: 1.26.4]
│   │   │   ├── packaging [required: >=20.0, installed: 24.0]
│   │   │   ├── pillow [required: >=8, installed: 10.2.0]
│   │   │   ├── pyparsing [required: >=2.3.1, installed: 3.1.2]
│   │   │   └── python-dateutil [required: >=2.7, installed: 2.9.0.post0]
│   │   │       └── six [required: >=1.5, installed: 1.16.0]
│   │   ├── numpy [required: Any, installed: 1.26.4]
│   │   ├── scipy [required: Any, installed: 1.12.0]
│   │   │   └── numpy [required: >=1.22.4,<1.29.0, installed: 1.26.4]
│   │   └── six [required: Any, installed: 1.16.0]
│   ├── scikit-learn [required: Any, installed: 1.4.1.post1]
│   │   ├── joblib [required: >=1.2.0, installed: 1.3.2]
│   │   ├── numpy [required: >=1.19.5,<2.0, installed: 1.26.4]
│   │   ├── scipy [required: >=1.6.0, installed: 1.12.0]
│   │   │   └── numpy [required: >=1.22.4,<1.29.0, installed: 1.26.4]
│   │   └── threadpoolctl [required: >=2.0.0, installed: 3.3.0]
│   ├── scipy [required: Any, installed: 1.12.0]
│   │   └── numpy [required: >=1.22.4,<1.29.0, installed: 1.26.4]
│   ├── six [required: Any, installed: 1.16.0]
│   └── tqdm [required: Any, installed: 4.66.2]
├── nibabel [required: Any, installed: 5.2.1]
│   ├── numpy [required: >=1.20, installed: 1.26.4]
│   └── packaging [required: >=17, installed: 24.0]
├── numpy [required: Any, installed: 1.26.4]
├── packaging [required: Any, installed: 24.0]
├── scikit-image [required: Any, installed: 0.22.0]
│   ├── imageio [required: >=2.27, installed: 2.34.0]
│   │   ├── numpy [required: Any, installed: 1.26.4]
│   │   └── pillow [required: >=8.3.2, installed: 10.2.0]
│   ├── lazy-loader [required: >=0.3, installed: 0.3]
│   ├── networkx [required: >=2.8, installed: 3.2.1]
│   ├── numpy [required: >=1.22, installed: 1.26.4]
│   ├── packaging [required: >=21, installed: 24.0]
│   ├── pillow [required: >=9.0.1, installed: 10.2.0]
│   ├── scipy [required: >=1.8, installed: 1.12.0]
│   │   └── numpy [required: >=1.22.4,<1.29.0, installed: 1.26.4]
│   └── tifffile [required: >=2022.8.12, installed: 2024.2.12]
│       └── numpy [required: Any, installed: 1.26.4]
└── scipy [required: Any, installed: 1.12.0]
    └── numpy [required: >=1.22.4,<1.29.0, installed: 1.26.4]

However, I've found that our usage of dipy is entirely independent of this test. In fact, if I uninstall dipy locally and comment out our dipy imports, the test is still able to be run just fine without error on my local Ubuntu machine. So, I'm at a bit of a loss as to why the dipy upgrade would cause this test (as well as other non-voxelmorph tests) to time out, and only for our macos GHA runners.

I'd be happy to do more debugging + narrow things down to a more reproducible example. But, before I do that, I wanted to report this issue upstream, just in case my general description of the issue rings a bell for a relevant change that dipy may have made between 1.8.0 and 1.9.0.

(Skimming https://github.com/dipy/dipy/releases/tag/1.9.0, the only macOS-related change I see is #3061, but I'm not entirely sure if that's related.)

Way to reproduce

[If reporting a bug, please include the following important information:]

  • Code example
  • Relevant images (if any)
  • Operating system and version (run python -c "import platform; print(platform.platform())")
  • Python version (run python -c "import sys; print('Python', sys.version)")
  • dipy version (run python -c "import dipy; print(dipy.__version__)")
  • dependency version (numpy, scipy, nibabel, h5py, cvxpy, fury)
    • import numpy; print("NumPy", numpy.version)
    • import scipy; print("SciPy", scipy.version)
    • import nibabel; print("Nibabel", nibabel.version)
    • import h5py; print("H5py", h5py.version)
    • import cvxpy; print("Cvxpy", cvxpy.version)
    • import fury; print("fury", fury.version)

Note

To be added by @joshuacwnewton.

@skoudoro
Copy link
Member

Hi @joshuacwnewton,

I confirm, The difference between 1.8.0 and 1.9.0 on macOS is OPENMP:
since 1.9.0, we build our package with openmp flag which was not the case before.

Not sure what is the issue. So maybe you are encountering concurrency, or openmp conflict, or something like that with pytorch. I will experiment next week

@joshuacwnewton
Copy link
Contributor Author

Thank you for the confirmation! I'll try some experimentation on our end, too. :)

joshuacwnewton added a commit to spinalcordtoolbox/spinalcordtoolbox that referenced this issue Mar 18, 2024
dipy==1.9.0 is built with OpenMP enabled (dipy/dipy#3125).

This is very similar to the issues we had with onnxruntime 1.5 and 1.6, and installing `libomp`
in our CI fixed this issue back then, too.

If this works, then I will report this upstream to the dipy devs, then we will pin `dipy!=1.9.0`,
since we probably don't want to make OpenMP a requirement for our users.
@joshuacwnewton
Copy link
Contributor Author

joshuacwnewton commented Mar 18, 2024

The OpenMP mention reminded me of a similar issue we had with an upstream change to onnxruntime:

tl;dr: onnxruntime required OpenMP to be installed for versions 1.5.x and 1.6.x. They then removed this requirement in 1.7.0 due to feedback:

Going off of what we did previously to fix this (install openmp in CI), I tried this on our test PR to see if it would prevent the runner from timing out. But, libomp is already installed:

Run brew install libomp
Warning: libomp 17.0.6 is already installed and up-to-date.

I will keep investigating, but my hunch is that our package will just have to specific dipy!=1.9.0 in the short term.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants