Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't require a full cython import to find PTX file #15785

Open
wants to merge 3 commits into
base: branch-24.06
Choose a base branch
from

Conversation

brandon-b-miller
Copy link
Contributor

We shouldn't need to do a full import of the cudf cython for the purposes of computing a relative path. It leads to awkward import errors when things go wrong that are misleading as to what cuDF is trying to actually do. Here I've deliberately removed the arrow lib to cause an error.

>>> import cudf
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/raid/brmiller/mambaforge/envs/cudf_dev/lib/python3.11/site-packages/cudf/__init__.py", line 9, in <module>
    _setup_numba()
  File "/raid/brmiller/mambaforge/envs/cudf_dev/lib/python3.11/site-packages/cudf/utils/_numba.py", line 124, in _setup_numba
    _get_cc_60_ptx_file()
  File "/raid/brmiller/mambaforge/envs/cudf_dev/lib/python3.11/site-packages/cudf/utils/_numba.py", line 16, in _get_cc_60_ptx_file
    from cudf._lib import strings_udf
  File "/raid/brmiller/mambaforge/envs/cudf_dev/lib/python3.11/site-packages/cudf/_lib/__init__.py", line 4, in <module>
    from . import (
ImportError: libarrow.so.1600: cannot open shared object file: No such file or directory

It also convolutes the purpose and effects of _setup_numba, which should aspire to configure numba independently of cuDF, and limits our ability to customize when and how we're importing the cython in cudf's main __init__.

With this PR, the error will now be:

>>> import cudf
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/raid/brmiller/mambaforge/envs/cudf_dev/lib/python3.11/site-packages/cudf/__init__.py", line 19, in <module>
    from cudf import api, core, datasets, testing
  File "/raid/brmiller/mambaforge/envs/cudf_dev/lib/python3.11/site-packages/cudf/api/__init__.py", line 3, in <module>
    from cudf.api import extensions, types
  File "/raid/brmiller/mambaforge/envs/cudf_dev/lib/python3.11/site-packages/cudf/api/types.py", line 20, in <module>
    from cudf.core.dtypes import (  # noqa: F401
  File "/raid/brmiller/mambaforge/envs/cudf_dev/lib/python3.11/site-packages/cudf/core/dtypes.py", line 13, in <module>
    import pyarrow as pa
  File "/raid/brmiller/mambaforge/envs/cudf_dev/lib/python3.11/site-packages/pyarrow/__init__.py", line 65, in <module>
    import pyarrow.lib as _lib
ImportError: libarrow.so.1600: cannot open shared object file: No such file or directory

Which feels a bit more natural and easy to diagnose as something being wrong with arrow.

@brandon-b-miller brandon-b-miller added bug Something isn't working cuDF (Python) Affects Python cuDF API. non-breaking Non-breaking change labels May 20, 2024
@brandon-b-miller brandon-b-miller self-assigned this May 20, 2024
@brandon-b-miller brandon-b-miller requested a review from a team as a code owner May 20, 2024 13:39
Copy link
Contributor

@vyasr vyasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this won't work for editable installs. The py files and the compiled extension modules are not side-by-side for the mode of editable installation we are using. The right way to make this work is to use importlib.resources to get all of the file paths without loading the module, but that will require some work upstream in scikit-build-core to support that.

@brandon-b-miller
Copy link
Contributor Author

Unfortunately this won't work for editable installs. The py files and the compiled extension modules are not side-by-side for the mode of editable installation we are using. The right way to make this work is to use importlib.resources to get all of the file paths without loading the module, but that will require some work upstream in scikit-build-core to support that.

How are you currently performing an editable install? The PTX file locations aren't relative to any extension module, they should just be relative to the py files themselves.

@vyasr
Copy link
Contributor

vyasr commented May 20, 2024

I'm doing an editable installation with pip install -e.... Try it out and see where files end up. I'm happy to show it to you on my machine if you'd like or if you have trouble reproducing the behavior I'm mentioning.

The problem is that the PTX files are generated at build time and installed via CMake. When scikit-build-core does an editable install, anything that gets installed by CMake is placed into a directory in site-packages, while any pure Python files are read directly from the source directory (as would be expected with an editable install). Observe:

(rapids) coder _ ~/cudf $ pwd
/home/coder/cudf
(rapids) coder _ ~/cudf $ find . -name "*.ptx"
./cpp/build/release/CMakeFiles/3.28.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx
./cpp/build/conda/cuda-12.2/release/CMakeFiles/3.29.2/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx
./cpp/build/conda/cuda-12.2/release/CMakeFiles/3.29.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx
./python/cudf/build/cp310-cp310-linux_x86_64/CMakeFiles/3.29.2/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx
./python/cudf/build/cp310-cp310-linux_x86_64/CMakeFiles/3.29.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx
./python/cudf/build/cp310-cp310-linux_x86_64/udf_cpp/CMakeFiles/shim_60.dir/shim.ptx
./python/cudf/build/cp310-cp310-linux_x86_64/udf_cpp/CMakeFiles/shim_80.dir/shim.ptx
./python/cudf/build/cp310-cp310-linux_x86_64/udf_cpp/CMakeFiles/shim_75.dir/shim.ptx
./python/cudf/build/cp310-cp310-linux_x86_64/udf_cpp/CMakeFiles/shim_70.dir/shim.ptx
./python/cudf/build/cp310-cp310-linux_x86_64/udf_cpp/CMakeFiles/shim_90.dir/shim.ptx
./python/cudf/build/cp310-cp310-linux_x86_64/udf_cpp/CMakeFiles/shim_86.dir/shim.ptx
(rapids) coder _ ~/cudf $ find $(python -c "import site; print(site.getsitepackages()[0])")/cudf -name "*.ptx"
/home/coder/.conda/envs/rapids/lib/python3.10/site-packages/cudf/core/udf/shim_80.ptx
/home/coder/.conda/envs/rapids/lib/python3.10/site-packages/cudf/core/udf/shim_86.ptx
/home/coder/.conda/envs/rapids/lib/python3.10/site-packages/cudf/core/udf/shim_60.ptx
/home/coder/.conda/envs/rapids/lib/python3.10/site-packages/cudf/core/udf/shim_70.ptx
/home/coder/.conda/envs/rapids/lib/python3.10/site-packages/cudf/core/udf/shim_75.ptx
/home/coder/.conda/envs/rapids/lib/python3.10/site-packages/cudf/core/udf/shim_90.ptx

@brandon-b-miller
Copy link
Contributor Author

I see - if the cython extension modules are the only thing we can guarantee the location of in an editable install, I'd like to propose we ship some kind of empty cython module along with _lib (_path perhaps?) for this purpose - one that does not import the full _lib namespace - what do you think @vyasr ? If that sounds like a reasonable path forward, I can adapt this PR for that goal, else, I'll create an issue where we can figure things out in more detail and come back to this later.

@vyasr
Copy link
Contributor

vyasr commented May 21, 2024

I guess it depends, what exactly is the problem that you're trying to solve? Is there a way to delay the set up of the PTX files until after you know cudf has been imported? That way you wouldn't have any need to import cudf prematurely. We could add a stub library, but I feel like that's going to be more trouble than it's worth. If we did move forward with that, you'd want to put a pyx file in core/udf with a function get_ptx_files or something so that it can always be colocated with the PTX files.

I've asked in scikit-build about how best to do this since I do think that's where work needs to be done to properly enable support.

@brandon-b-miller
Copy link
Contributor Author

We're only reading from a PTX file because the version of CUDA used to compile it is needed to determine if MVC is necessary. This is the case where our package is compiled with CUDA V.X and the user has driver version V.Y, where X > Y. Unfortunately I think that means that we can't import cudf first, since later on in the cuDF import process we import numba.cuda after which we can't patch the linker. So, this process must happen before the rest of cuDF initializes. I'm of the opinion that we should aspire to fix this, mostly because I think in the current state it might be confusing to users exactly what piece of the initialization process is going wrong.

@vyasr
Copy link
Contributor

vyasr commented May 21, 2024

While we're waiting to hear back re:scikit-build, maybe a try-except could do the trick as a patch too? I'm not sure that I see a way for the empty lib approach to work because just doing import cudf._lib.empty_lib will trigger a full import of everything in cudf AFAIK. We import too many things in __init__.py files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuDF (Python) Affects Python cuDF API. non-breaking Non-breaking change
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

None yet

2 participants