New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI/BLD: use scipy-openblas wheel when building #20585
base: main
Are you sure you want to change the base?
Conversation
pyproject.toml
Outdated
@@ -139,7 +138,7 @@ test-command = "bash {project}/tools/wheels/cibw_test_command.sh {project}" | |||
[tool.cibuildwheel.linux] | |||
manylinux-x86_64-image = "manylinux2014" | |||
manylinux-aarch64-image = "manylinux2014" | |||
before-build = "bash {project}/tools/wheels/cibw_before_build_linux.sh {project}" | |||
before-build = "bash {project}/tools/wheels/cibw_before_build.sh {project}" | |||
|
|||
[tool.cibuildwheel.macos] | |||
before-build = "bash {project}/tools/wheels/cibw_before_build_macos.sh {project}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line will override any config specified in [tool.cibuildwheel]
(i.e. the new file you add won't get called). Ditto the specific windows file.
For me it makes sense to separate configuration into separate cibw_before_build_platform
files, a single file can get convoluted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps there could be a common file, and individual files in this line, i.e.
before-build = "bash {project}/tools/wheels/cibw_before_build_common.sh; bash {project}/tools/wheels/cibw_before_build_macos.sh {project}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line will override any config specified in
[tool.cibuildwheel]
(i.e. the new file you add won't get called).
At this time, I only want to override the linux wheel building to prove the wheels work. If it works, we can think about how to generalize it in another PR/other commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. The changes you make here are also overridden in wheels.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oy. I will try removing the redundant line from wheels.yml
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me it makes sense to separate configuration into separate ...
OK, adopted in the latest commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a duplicate is sometimes useful.
Hmm. On the other hand, it is a bit obscure and makes replicating the wheel build locally more difficult...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could document the ability to override the pyproject.toml values with local environment variables, and local specialized builds would do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. The main reason to do so is e.g. employ specific package versions, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like @mattip's idea. The duplication makes the code harder to understand, and I'd really like it to be as clean as possible. So comments only to "how do I tweak this if needed" are nicer than more bash code.
4737ad9
to
8214296
Compare
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This need for LD_LIBRARY_PATH
is quite annoying. @ev-br pinged me about an issue with this as well yesterday. I haven't been able to reproduce it myself, but my conclusion was that Libs: -L{libdir} -lscipy_openblas
in the generated .pc
file is not reliable since it depends on the loader's library search path configuration as well as whether site-packages
is under sysroot or not, and it should be replaced with /abspath/to/scipy_openblas.so
to ensure the built extension modules will always find scipy_openblas.so
at runtime.
All of the boilerplate of copying shared libraries around should be reduceable to:
$ python -c "import scipy_openblas32; print(scipy_openblas32.get_pkg_config())" > scipy-openblas.pc
$ export PKG_CONFIG_PATH=$PWD
Does that sound right to you @mattip?
For LD_LIBRARY_PATH, my band-aid fix was to copy the .so file from build/ to build-install: https://github.com/ev-br/openblas-bench/blob/main/.github/workflows/codspeed.yml#L42 Still puzzling why the two libs have different linkage. |
LD_LIBRARY_PATH is needed at runtime, not build time. The linker arguments in the pkg_config file are for build time, not runtime. In a proper wheel, delocate/auditwheel are used so LD_LIBRARY_PATH is no longer needed. But in the CI workflows, we need to either copy the dependencies into somwhere like /usr/local/lib, which is what the current scripts do, or use LD_LIBRARY_PATH |
I know it's a runtime thing, I am saying that it should not be needed. It is bad for a build to succeed and then fail at runtime.
We have a goal of wanting it to work without vendoring. Also, folks are trying to use the I don't think it's hard. It should only require passing an absolute path rather than |
Makes sense. I can try this next week |
That seems to work on linux. How should this work on macos? Linking with the absolute path does not preserve the path for runtime, nor does using |
This is what conda-forge compilers use on macOS, so I expect it to work:
It would be surprising if it works when given via |
The The missing part is that I need to do this in order to "burn" the
I am still trying to figure out how to set the |
The
or with:
To test that, you can fix it locally like this:
After that, the install name is correct:
and adding You can also try checking install names of for example the tl;dr rebuilding |
I updated the wheels so those two lines are all that is needed. The error which indicates something is not working right with the
|
pyproject.toml
Outdated
RUNNER_OS="Linux" | ||
# /project will be the $PWD equivalent inside the docker used to build the wheel | ||
PKG_CONFIG_PATH="/project/.openblas" | ||
LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/project/.openblas/lib" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can go now, right? The PKG_CONFIG_PATH
is maybe still needed (?), but the rest of all this should not be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
echo PKG_CONFIG_PATH $PKG_CONFIG_PATH | ||
PKG_CONFIG_PATH=$PROJECT_DIR/.openblas | ||
rm -rf $PKG_CONFIG_PATH | ||
mkdir -p $PKG_CONFIG_PATH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rm -rf
and mkdir
at least should no longer be needed. Also maybe change PKG_CONFIG_PATH
to simply $PWD
, so the separate .openblas
directory isn't needed? A single .pc
file can live anywhere, no need for a subdir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adopted
Great, much nicer! I'll try to have a look at the The macOS job failed, detection didn't work:
but I assume that's because this is now a pared-down PR focused on Linux? |
Yes, once the speediest build job to use a scipy-openblas wheel works (which I think happens to be the musllinux one), I will clean this all up for the other build jobs and wheel builds. |
If it helps to debug the problem, I can clean up the macos one so it too fails to apply the prefix? |
The reason the symbol prefix is not being honored is because we currently only generate the required BLAS/LAPACK wrappers for Accelerate builds to handle their special symbol suffixes. Since scipy-openblas has a special symbol prefix, we can fix this by enabling wrapper generation for those builds as well (the commit I added). The wrappers should only be generated when there is a symbol prefix or suffix because they result in infinite recursion when the wrapper symbols are the same as the BLAS/LAPACK symbols (as they are when there is no prefix/suffix). Could we automatically detect whether a BLAS/LAPACK library has prefixes/suffixes instead of enabling wrapper generation on a case-by-case basis? |
Ah of course, good catch @thalassemia.
Not fully automatically, but we can specify a set of "these are the set of potential prefixes/suffixes to check". E.g.: https://github.com/numpy/meson/blob/4e370ca8ab73c07f7b84abe8a4b937caace050a4/mesonbuild/dependencies/blas_lapack.py#L398-L409 For now doing it manually is the way to go, but we can add a standard set to check in Meson. I think it mainly matters for OpenBLAS (and most ILP64 builds of course). |
c9b77bb
to
30c9033
Compare
The |
[skip cirle] [skip cirrus]
Locally running the 32-bit linux in a docker passes tests. I wonder what went wrong in the CI. |
The 32-bit tests get stuck in |
I can reproduce the 32-bit stall in a local Docker image. Stepping through with a debugger, execution stalls after this line: Line 561 in 0f87966
Stepping into the Numpy code, I identified numpy.linalg._umath_linalg.svd_m_s as the stalled function.
When I get some time later, I'll try bumping the Numpy version and/or installing |
Two of the CI failures I think are not related:
"Linux tests / 32-bit, fast, py3.10/npMin, dev.py" hangs somewhere in numpy, which is being built from source on every PR run since NumPy does not release wheels for 32-bit linux. Before this PR, it would pick up the openblas installed from the openblas_support script. Now the build is not using any accelerator, which may be the reason for the hang. |
I tried to update the linux-32-test to use numpy 1.26.5 and the scipy-openblas wheels, but ran up against numpy/numpy#26437, so I used 2.0.0rc2 instead. Locally I see test failures, let's see what CI does |
Maybe 32bit SciPy is incompatible with NumPy2.0.0? This line scipy/scipy/stats/_stats_py.py Line 1085 in 7febfdd
dtype=int32 which fails (finfo(int) raises an error). Earlier there is this scipy/scipy/stats/_stats_py.py Lines 1050 to 1054 in 7febfdd
sctypes["int"] , which is [<class 'numpy.int8'>, <class 'numpy.int16'>, <class 'numpy.intc'>, <class 'numpy.int64'>] . Note int32 is missing. I will raise an issue on NumPy.
|
[skip cirrus] [skip circle]
Brilliant, thanks. Indeed, "Linux tests / 32-bit" is now passing. The remaining two CI failures are not related to this PR. However there is #20718 about build times being slower when using the |
The 45 failures on Windows that are timing-related do look like an issue with this PR (or this
Comparing with a recent run on
|
Replaces #20074. Uses the scipy-openblas32 wheel for building Scipy
The first commit changes linux wheel builds to use the scipy-openblas32 wheel
Reference issue
What does this implement/fix?
Additional information
cc @rgommers