Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue: stub generation #420

Closed
wjakob opened this issue Feb 14, 2024 · 105 comments · Fixed by #496
Closed

Tracking issue: stub generation #420

wjakob opened this issue Feb 14, 2024 · 105 comments · Fixed by #496

Comments

@wjakob
Copy link
Owner

wjakob commented Feb 14, 2024

Dear all (cc @cansik @torokati44 @qnzhou @tmsrise @rzhikharevich-wmt @njroussel @Speierers),

I am interested in providing a stub generation mechanism as part of nanobind. This is a tracking issue to brainstorm solutions.

Context: @cansik's nanobind-stubgen package is the only solution at the moment and works well in many cases. My goal is to overcome limitations discussed in discussion #163:

  1. Enabling a better "out-of-the-box" experience by integrating stub generation into the CMake build system.

  2. Stub generation currently involves complicated parsing, which is fragile and not always well-defined. Nanobind has this information in a more structured form and could provide it.

  3. Stubs serve two purposes, and stub generation should cater to both needs:

  • To get autocomplete in VS Code and similar tools, which requires extracting function signatures and docstrings. I am mainly interested in this use case.

  • Type checkers like MyPy. I haven't used them before and know very little (hence this issue to exchange experience). It seems to me that stubs only need to contain typed signatures but no docstrings. But nanobind often generates type annotations that MyPy isn't happy with, so some sort of postprocessing may be needed.

Here is what I have in mind, before having actually having done anything. There may be roadblocks I haven't considered.

  1. The CMake build system gets a new command nanobind_add_stubs. This will register a command that is run at install time. Basically we need the whole package to be importable, and doing that in a non-installed build might be tricky.
nanobind_add_stubs(
  PATH ${CMAKE_INSTALL_PREFIX}
  PACKAGE nanobind_example
  DEPENDS nanobind_example_ext
)

When the user installs the extension to ${CMAKE_INSTALL_PREFIX}, this will run a Python file (shipped as part of the nanobind distribution) that imports the package and then generates nanobind_example/__init__.pyi.

Here, I am already getting confused because of unfamiliarity with stub generation. I've seen that packages sometimes contain multiple .pyi files. How does one decide where to put what? Can .pyi files import each other? What would be the best way to expose this in the nanobind_add_stubs() function?

  1. I also wanted to modify nanobind's function class (nb_func) so that it exposes information in a more structured way, a bit like __signature__ from inspect.signature. But __signature__ is too limited because it (like Python) has no concept of overload chains.

Therefore, I am thinking of adding a function __nb_signature__ that returns list of pairs of strings [("signature", "docstring"), ...] that the stub generator can turn into something like this

from typing import overload

@overload
def func(x: int):
    """docstring 1"""

@overload
def func(x:str):
    """docstring 2"""
  1. Some types signatures in nanobind aren't parseable in Python. There are a few things that I think could wrong:
  • What if a C++ type hasn't been mapped yet when the extension is imported? In that case, the nanobind docstring includes the raw type (something like std::__1::vector<Item *>). In that case, the stubs could omit that overload entirely, put some generic placeholder (object?) or put the type name into a string. Thoughts?
  • The representation of default arguments (via __repr__) might not make sense as a Python expression. This seems like an unsolvable problem because nanobind simply does not know the Python expression to re-create an object. One option would be to try to eval() the expression in the stub generator and omit it or replace it by some kind of placeholder if an issue is found. Not sure -- thoughts?
  • Some nanobind type features don't have equivalents in typing.*. An example are the nd-array types annotations which are AFAIK too complex to be handled by anything currently existing. I'm thinking that it could be useful if the stub generator command nanobind_add_stubs(..) could be called with a user-provided Python file that implements some kind of post-process on the type signatures.

I'm curious about your thoughts on this! Thanks!

@jdumas
Copy link

jdumas commented Feb 14, 2024

Great to see some discussion on first-party support for stub generation in Nanobind.

I would like to add an important use-case for stub generation: documentation. As an example, we've generated this reference documentation for Lagrange, which uses Nanobind to create Python bindings (still very much WIP). I tried to use Sphinx Autodoc, but it doesn't support native C extensions (sphinx-doc/sphinx#7630). So instead I am using this Sphinx AutoAPI extension, which works by parsing stub files generated by Nanobind.

@wjakob
Copy link
Owner Author

wjakob commented Feb 15, 2024

@jdumas You might be interested in seeing the approach for an upcoming version of the Dr.Jit project that is based on nanobind: https://drjit.readthedocs.io/en/nanobind_v2/reference.html. I've so far managed without stubs -- this is the associated Sphinx file with plenty use of autofunction, autoclass, etc.: https://github.com/mitsuba-renderer/drjit/blob/nanobind_v2/docs/reference.rst?plain=1

@wjakob
Copy link
Owner Author

wjakob commented Feb 15, 2024

I started PR #421 to track progress towards landing this feature. The first commit (d71a990) adds a __nb_signature__ property to nanobind functions that exposes metadata about function overloads and default arguments in a more structured manner.

@njroussel
Copy link
Contributor

njroussel commented Feb 15, 2024

1.

Here, I am already getting confused because of unfamiliarity with stub generation. I've seen that packages sometimes contain multiple .pyi files. How does one decide where to put what? Can .pyi files import each other?

Yes, somewhat.
Some autocomplete systems know how to deal with .pyi files importing other .pyi. As far as I can remember, this isn't properly documented by any PEP. In fact, if anything, the PEP 561 is pretty explicit about *.pyi files not requiring any "runtime" behavior. However, in practice, we do have such imports in https://github.com/mitsuba-renderer/drjit and they are picked up by Pyright (the underlying tool in VsCode and other IDEs).

What would be the best way to expose this in the nanobind_add_stubs() function?

I believe that splitting *.pyi files is only necessary if there are multiple submodules in the package. Finer-grain splitting is most likely only somewhat beneficial to the autocomplete system (higher parallelism when parsing). Exposing this control to the user would require them to provide a mapping from a stub file name to a list which contains all the symbol names they wish to have in said stub file. I really don't see an elegant way of handling this in CMake.


3

For the default arguments, there could be an extension to nb::arg_v that additionally takes a string which should be the Python expression to construct the default value. This is obviously extra work on the user, but as you said, there is no way of knowing the equivalent Python expression. I'm not against still have a default implementation that attempts to use eval(), but I think there should be an option for users to set it explicitly if they wish.

@wjakob
Copy link
Owner Author

wjakob commented Feb 15, 2024

@njroussel:
Regarding point 1, would it be sufficient to let the user specify a list of modules and submodules for which stubs should be generated, and then it makes separate file for each one?

After some thinking, the approach that I was planning regarding 3 is as follows (it somewhat mirrors what MyPy's stubgen does):

Simple default arguments that can be parsed out-of-the-box are included in the stub. More complicated things (elaborate types, multi-line strings) that might actually reduce readability of a function signature are abbreviated as "= ..." (where ... is the Python Ellipsis object).

The stub generator will by default accept types of: int, float, bool, NoneType, str (if not multi-line and shorter than a hardcoded limit) and nanobind enumerations. For more complex cases, I plan to have an extension mechanism. When calling the stub generator, the user can provide a custom Python file with a hook to turn other objects into Python expressions (or return None, which activates the default behavior). This seems less burdensome than having to define lots of custom nb::arg_v strings in a codebase (one place to catch them all, so to say).

Putting all together, it might look like this:

nanobind_add_stubs(
  # directory relative to which the stub generator will be called
  # (the modules should be importable from here)
  PATH ${CMAKE_INSTALL_PREFIX}
  # Sequence of modules and submodules, stubgen will separately process each one
  MODULES nanobind_example nanobind_example.submodule_1
  # Ensure that a given dependency is built and installed
  DEPENDS nanobind_example_ext
  # The user can provide a python file with hooks as part of their project
  # that modifies the behavior of stubgen
  HOOKS ${CMAKE_CURRENT_SOURCE_DIR}/stubgen_hooks.py
)

@wjakob
Copy link
Owner Author

wjakob commented Feb 16, 2024

cc also @skallweitNV & @laggykiller, who seem interested in stub generation

@skallweitNV
Copy link
Contributor

This all sounds excellent. Some comments from my point of view:

I'd like to still be able to build stubs during development at build time. But people who want that can probably just call the stubgen manually from their build system. No need to have it in nanobind's cmake infrastructure.

Regarding default values. Handling them using hooks seems like a good idea, but you could still consider the nb::arg_v with default value string as an option. It might prove valuable in certain cases IMO.

And thanks for all your efforts @wjakob, nanobind is awesome!

@laggykiller
Copy link
Contributor

Sounds great and looking forward for this to get implemented!

One thing to suggest, some user may add sphinx docstring to the nanobind code (e.g. myself, see this example). In those case, I think nanobind stub generator could try to parse sphinx docstring and use it as a hint for function signature and stub generation. This can provide a way for user to override auto generated stub if somehow the auto generated stub is incorrect.

Unfortunately, sphinx docstring do not have standardized way to denoate the default value of a function, so this can't be used to solve the problem of dealing with default arguments that do not make sense as a Python expression. Or, we could create our own standard of how default value of function should be denoted in sphinx docstring such that it could be recognized and used as hint during stub generation.

Another small thing to add is if representation of default argument do not make sense as a Python expression, we could denote the type of the argument as Optional as placeholder.

@rzhikharevich-wmt
Copy link

rzhikharevich-wmt commented Feb 16, 2024

Another small thing to add is if representation of default argument do not make sense as a Python expression, we could denote the type of the argument as Optional as placeholder.

That's not really necessary as there is the ellipsis syntax for omitting default values from stubs. Besides, it would make type checkers validate code that explicitly passes an Optional value even when it's not applicable.

@wjakob
Copy link
Owner Author

wjakob commented Feb 17, 2024

Dear all,

I've made progress on this and have a working stub generation workflow that works well on a large-ish project (~300KB stub file with a mixture of classes, functions, properties, enums, and mixed bindings+pure Python code).

Could I ask you all to check it out and report any issue you encounter? The latest version is available on the stubgen branch.

There is also documentation on ReadTheDocs:

I followed @skallweitNV's suggestion and for now only have a CMake command that runs at build time. I plan to look into an install-time option later.

I also plan to make one major API and ABI-breaking change that will imply moving to nanobind version 2.0 (!)

The nb::raw_doc annotation completely overwrites a docstring of an overload chain. People usually use it to hand-craft a combined signature + docstring in cases where nanobind wasn't able to generate a usable/nice typed function signature.

m.def("f", &f, nb::raw_doc(R"(
my_func(a: complicated_type)

A docstring)"));

Interpreting this "raw" format is problematic when the stub generator needs to split up the signature into its constituent overloads. stubgen would the need to parse the docstring, and the whole point of this PR is to create an alternative approach for stub generation that doesn't involve parsing docstrings because it is brittle. Really the main reason for this raw format is to take control of the function signature.

Hence, my plan is to replace this with a new annotation named nb::signature. One can use it to specify multiple overloads, each with a custom signature (if needed) and docstring (if needed),

m.def("f", [](std::vector<int>) { }, nb::signature("(FantasticVector[int]) -> None"), "Docstring 1");
m.def("f", [](std::set<int>) { }, nb::signature("(FantasticSet[int]) -> None"), "Docstring 2");

stubgen will simply copy-paste the signature into a @typing.overload chain. In this example, it will generate

from typing import overload

@overload
def f(FantasticVector[int]) -> None:
    """Docstring 1"""

@overload
def f(FantasticSet[int]) -> None:
    """Docstring 2"""

del overload

@laggykiller
Copy link
Contributor

laggykiller commented Feb 17, 2024

Nice work! Some minor problems while trying out using my project: https://github.com/laggykiller/apngasm-python/tree/stubgen-official


Problem 1

I tried to added the following lines in CMakeLists.txt:

nanobind_add_stub(
    apngasm_python_stub
    MODULE apngasm_python._apngasm_python
    OUTPUT src-python/apngasm_python/_apngasm_python.pyi
    MARKER_FILE src-python/apngasm_python/py.typed
    VERBOSE
)

This does not work if apngasm_python is not installed while wheel is being built as apngasm_python cannot be imported for stub generation. As a workaround, I have to generate stub file manually after building and installing:

python -m nanobind.stubgen -m apngasm_python._apngasm_python -o src-python/apngasm_python/_apngasm_python.pyi -M src-python/apngasm_python/py.typed

This is not quite ergonomic. If I want to build a wheel with stub, I have to build and install wheel without stub first, run the command to generate stub, then build the wheel with stub!


Problem 2

Reference to classes within the same file is not correct. Consider this pyi file

class APNGAsm:
    @overload
    def __init__(self, frames: Sequence[apngasm_python._apngasm_python.APNGFrame]) -> None:
        ...

class APNGFrame:
    ...

Pylance would get lost:
image

It should be:

class APNGAsm:
    @overload
    def __init__(self, frames: Sequence[APNGFrame]) -> None:
        ...

class APNGFrame:
    ...

Pylance is happy with this:
image

Alternatively:

from . import _apngasm_python

class APNGAsm:
    @overload
    def __init__(self, frames: Sequence[_apngasm_python.APNGFrame]) -> None:
        ...

class APNGFrame:
    ...

Problem 3

It is not necessary to delete imports in pyi, it triggers warning in pylance:
image

@laggykiller
Copy link
Contributor

P.S.

Consider this function to be binded

size_t addFrame(rgb *pixels, unsigned int width, unsigned int height, rgb *trns_color = NULL, unsigned delayNum = DEFAULT_FRAME_NUMERATOR, unsigned delayDen = DEFAULT_FRAME_DENOMINATOR);

Note that trns_color has type rgb* and default value is NULL.

This is the binding code

        .def("add_frame_from_rgb", nb::overload_cast<apngasm::rgb *, unsigned int, unsigned int, apngasm::rgb *, unsigned, unsigned>(&apngasm::APNGAsm::addFrame),
        "pixels_rgb"_a, "width"_a, "height"_a, "trns_color"_a = NULL, "delay_num"_a = apngasm::DEFAULT_FRAME_NUMERATOR, "delay_den"_a = apngasm::DEFAULT_FRAME_DENOMINATOR)

This is the signature generated

 def add_frame_from_rgb(self, pixels_rgb: apngasm_python._apngasm_python.rgb, width: int, height: int, trns_color: apngasm_python._apngasm_python.rgb = 0, delay_num: int = 100, delay_den: int = 1000) -> int:

I would expect trns_color signature to be trns_color: Optional[rgb] = None

@wjakob
Copy link
Owner Author

wjakob commented Feb 17, 2024

@laggykiller Thanks for giving it a try. The challenge with generating stubs at build vs install time is mentioned above and was something I still wanted to look into.

In your example, wouldn't it be better to generate a stub for the actual module containing your types? (i.e., apngasm_python._apngasm_python). Then you could add a manual stub file __init__.py that imports the types.

Alternatively let me teach you about a dirty hack that I use in my own projects.

NB_MODULE(my_module_ext, m_) {
   (void) m_; /* unused */
   nb::module m = nb::module_::import_("my_module");
   // register functions and classes here
}

In other words, importing the sub-extension causes it to add its types to the parent module. That way, all of the type metadata (__module__) causes the created functions and types to be in the expected place as seen by consumers of the API.

An alternative option for the stubs would be to register a rewriting rule with the stub generator that rewrites all occurrences of apngasm_python._apngasm_python to apngasm_python. I am thinking about adding the capability for such custom hooks but have not done so yet.

For your second message, does it work if you annotate the argument as "trns_color"_a.none() = NULL?

The del declarations at the end are caused by my inexperience with stubs in general. At a high level, it does make sense that they are there, no? We don't want Optional, Annotated, etc., to be listed as official exports of the stub. How does one do this then? Import them as _Optional with an underscore?

Edit: never mind about the last point. I found out that the typing documentation explains a clear set of rules when an import is considered to be part of the stub exports.

@laggykiller
Copy link
Contributor

laggykiller commented Feb 17, 2024

wouldn't it be better to generate a stub for the actual module containing your types?

I don't quite get what you are trying to say, but I think my command is doing what you said...? I specified -m apngasm_python._apngasm_python when calling python -m nanobind.stubgen command.


Alternatively let me teach you about a dirty hack that I use in my own projects.

I replaced the following lines in my binding code:

NB_MODULE(MODULE_NAME, m) {
    m.doc() = "A nanobind API for apngasm, a tool/library for APNG assembly/disassembly";

Replaced as:

NB_MODULE(MODULE_NAME, m_) {
    (void) m_; /* unused */
    nb::module m = nb::module_::import_("apngasm_python");
    m.doc() = "A nanobind API for apngasm, a tool/library for APNG assembly/disassembly";

This causes error during compilation:

error: ‘module’ is not a member of ‘nb’; did you mean ‘module_’?
   37 |     nb::module m = nb::module_::import_("apngasm_python");
      |         ^~~~~~
      |         module_

So I replaced nb::module m as nb::module_ m. Compilation was 'successful', but now apngasm_python._apngasm_python is empty!


does it work if you annotate the argument as "trns_color"_a.none() = NULL?

The signature generated is trns_color: Optional[apngasm_python._apngasm_python.rgb] = 0. Optional is added correctly, but Pylance is unhappy that the default value is 0 instead of None.


At a high level, it does make sense that they are there, no? We don't want Optional, Annotated, etc., to be listed as official exports of the stub.

I think it is fine to not del imports, see a random example from Microsoft: https://github.com/microsoft/python-type-stubs/blob/e328827329a072baac06e287bd6146ff80796e5e/stubs/skimage/transform/hough_transform.pyi


Btw, would it be better that instead of:

from typing import Annotated
from typing import Optional
from typing import Sequence
from typing import overload

We import by...

from typing import Annotated, Optional, Sequence, overload

@wjakob
Copy link
Owner Author

wjakob commented Feb 17, 2024

@laggykiller

So I replaced nb::module m as nb::module_ m. Compilation was 'successful', but now apngasm_python._apngasm_python is empty!

That's right. But did you check the contents of apngasm_python? All your extension types should now be there without requiring the indirection through a nested module.

Thanks for letting me know about the issue with NULL pointers, I will fix that.

In the meantime, I added support for install-time stub generation.

@laggykiller
Copy link
Contributor

But did you check the contents of apngasm_python?

Seems like I have to add from . import _apngasm_python in __init__.py for this to work. Now I can import like from apngasm_python import APNGAsm.

Now, if I want to generate stub, I would run python -m nanobind.stubgen -m apngasm_python -o src-python/apngasm_python/__init__.pyi -M src-python/apngasm_python/py.typed.

Signature generated is:

def __init__(self, frames: Sequence[apngasm_python.APNGFrame]) -> None:

Which is still not resolved by pylance, so this hack is not solving the problem
image

In the meantime, I added support for install-time stub generation.

I tried by adding the following lines to the end of CMakeLists.txt:

nanobind_add_stub(
    apngasm_python_stub
    INSTALL_TIME
    MODULE apngasm_python
    OUTPUT src-python/apngasm_python/__init__.pyi
    MARKER_FILE src-python/apngasm_python/py.typed
    VERBOSE
)

I have also tried:

nanobind_add_stub(
    apngasm_python_stub
    INSTALL_TIME
    MODULE apngasm_python
    OUTPUT __init__.pyi
    MARKER_FILE py.typed
    VERBOSE
)

However, __init__.pyi was not generated when I build wheel with python -m build . or build+install with pip install .

@wjakob
Copy link
Owner Author

wjakob commented Feb 17, 2024

@laggykiller Can you try again with the latest version? This may also fix your issue with the workaround I had mentioned earlier. For the INSTALL_TIME approach, it's likely that the stub was written to an unexpected place, try specifying VERBOSE. Perhaps you just need to set OUTPUT apngasm_python/__init__.pyi

@laggykiller
Copy link
Contributor

laggykiller commented Feb 18, 2024

Can you try again with the latest version? This may also fix your issue with the workaround I had mentioned earlier.

I can see that from apngasm_python.apngasm import APNGAsmBinder is added, but that does not solve the problem of apngasm_python.APNGFrame not being recognized. Note that in my project, APNGFrame is actually a nanobind class with binding code in the same file as APNGAsm

image

A solution specific for this hack is to add from . import apngasm_python, but I think that is a circular import?

For the INSTALL_TIME approach, it's likely that the stub was written to an unexpected place

It looks like stubgen never occurs if I add INSTALL_TIME. Note that my project uses py-build-cmake, not sure if that is related to this problem?

Here is my CMakeLists.txt with relevant part highlighted: https://github.com/laggykiller/apngasm-python/blob/ca7ae854aa2fdb3da3dff891a5971525249586fe/CMakeLists.txt#L117-L124

@wjakob
Copy link
Owner Author

wjakob commented Feb 18, 2024

@laggykiller Are you sure that you used the latest version? The output looks wrong to me (for example, the latest version of the branch now groups imports). Regarding the other issue, I suspect that it may be related to a requirement to specify COMPONENT for use with py-build-cmake. I just added an analogous COMPONENT parameter to nanobind_add_stub(). Can you try with that?

@laggykiller
Copy link
Contributor

laggykiller commented Feb 18, 2024

Are you sure that you used the latest version?

I tried again just now, and the stub generated is error free regardless of using the hack or not 🎉 (Except for NULL default value treated as 0 instead of None).

image

I did notice that Optional[rgb] now becomes rgb | None. This would not work for python version <3.10 (https://peps.python.org/pep-0604/), so for compatibility sake, I think we should use Union[rgb, None] or even better, Optional[rgb].

I just added an analogous COMPONENT parameter to nanobind_add_stub(). Can you try with that?

Generating stub in cmake still not happening though:

nanobind_add_stub(
    apngasm_python_stub
    INSTALL_TIME
    MODULE apngasm_python
    OUTPUT "${PY_BUILD_CMAKE_MODULE_NAME}/__init__.pyi"
    MARKER_FILE "${PY_BUILD_CMAKE_MODULE_NAME}/py.typed"
    COMPONENT python_modules
    VERBOSE
)

https://github.com/laggykiller/apngasm-python/blob/11787ad3ff99b7565e88d5f6ed3f6451e8df7b1c/CMakeLists.txt#L116-L125

You can try by:

git clone https://github.com/laggykiller/apngasm-python.git
cd apngasm-python
git checkout stubgen-official
rm ./src-python/apngasm_python/_apngasm_python.pyi
rm ./src-python/apngasm_python/py.typed
pip install build
python -m build .

It is expected that stub generation to occur during build and shown in log, and stub found inside wheel, but seems like no stub generation occured at all.

@wjakob
Copy link
Owner Author

wjakob commented Feb 18, 2024

@laggykiller Compilation of your repository fails on machine (macOS) in a boost build step performed by Conan.

 ./boost/container_hash/hash.hpp:131:33: error: no template named 'unary_function' in namespace 'std'; did you mean '__un
ary_function'?
          struct hash_base : std::unary_function<T, std::size_t> {};
                             ~~~~~^~~~~~~~~~~~~~
                                  __unary_function

However, it works after removing Boost since it doesn't actually seem to be used by the project. I figured out the issue with the install-time stub generation, this is now fixed. I also added the ability to pass EXCLUDE_FROM_ALL to nanobind_add_stub().

nanobind + stubgen now avoid the A | B and A | None union syntax when running on Python < 3.10.

@laggykiller
Copy link
Contributor

Compilation of your repository fails on machine (macOS)

Thanks for reporting! Can confirm this issue, seems like newer macOS version install newer Xcode which defaults to newer C++ standard that removed unary_function. This should be fixed by updating boost and install only required components. apngasm.cpp requires boost though, not sure how you managed to remove boost and successfully compile... Anyway I am sidetracking...

I figured out the issue with the install-time stub generation, this is now fixed.

Can confirm that it is now fixed 🎉 ! Only problem left is NULL default value treated as 0 instead of None...

nanobind + stubgen now avoid the A | B and A | None union syntax when running on Python < 3.10.

I think using Optional[A] and Union[A, B] for all versions of python is better, as stub generated could be used for all nanobind supported python versions (i.e. python >= 3.8)... But I can confirm that the correct stub syntax is generated for python < 3.10

@wjakob
Copy link
Owner Author

wjakob commented Feb 19, 2024

I think using Optional[A] and Union[A, B] for all versions of python is better, as stub generated could be used for all nanobind supported python versions (i.e. python >= 3.8)... But I can confirm that the correct stub syntax is generated for python < 3.10

I find the | syntax much easier on the eyes. For variadic types (tuple[] vs typing.Tuple[]), the latter form was even deprecated, so having one fixed stub format for all Python versions seems infeasible. I think the fact that nanobind generates the "best syntax" when possible make sense -- the stubs would normally be included in a binary wheel that also contains a compiled part, so Python 3.8 would never encounter stubs requiring Python 3.10.

For the NULL vs 0 part, note that your bindings do something unsupported: the type nb::ndarray is handed by type casting, and nanobind does not support converting None into NULL-valued pointers. This is only supported for bindings (types declared via nb::class_<..>). This is a difference to pybind11 to simplify things, in case you are familiar with the former. Instead of nb::ndarray<unsigned char, nb::shape<3>> *trns_color,, declare this parameter using std::optional and include needed type caster. For the default value, you must then specify "trns_color"_a = nb::none().

@laggykiller
Copy link
Contributor

@wjakob Got it, thanks! No more problems from my side. Let's see other's test result...

@nicholasjng
Copy link
Contributor

Nice work! I just wanted to comment that I'm currently working on providing Bazel support for Python bindings with nanobind - I'll try adding the stubgen into a project and report how it goes when I get time.

@wjakob
Copy link
Owner Author

wjakob commented Feb 19, 2024

PR #421 has reached a state where I'm basically happy with it. It would be great if others can try it as well and report their successes or problems (e.g. @torokati44 @qnzhou @tmsrise @rzhikharevich-wmt).

@cansik
Copy link

cansik commented Feb 19, 2024

@wjakob Wow, thanks for all the work you put into this extension 🎉. I will archive my project and add a link to the official implementation as soon as it is merged.

Maybe this should be a question in the nanogui repository, but will you be releasing a new package of nanogui that includes the stubs as well?

@wjakob
Copy link
Owner Author

wjakob commented Feb 19, 2024

Maybe this should be a question in the nanogui repository, but will you be releasing a new package of nanogui that includes the stubs as well?

Great idea, that would be a nice test that all works as expected.

@wbthomason
Copy link

Thanks @wjakob, I'm really excited for this! I do have some questions about its use with submodules, though.

e.g. from line 344 of stubgen.py in #421, it seems that this will not handle structures created by https://nanobind.readthedocs.io/en/latest/api_core.html#_CPPv4N8nanobind7module_13def_submoduleEPKcPKc - is that the case? If so, what is the intended way to generate stubs for submodules?

I tried something like the following, with a separate stubgen call for each submodule:

nanobind_add_stub(
  my_py_stub
  INSTALL_TIME
  INCLUDE_PRIVATE
  MODULE my_py_mod.my_nb_ext_mod.my_nb_ext_submod
  OUTPUT "${CMAKE_CURRENT_SOURCE_DIR}/__init__.pyi"
  MARKER_FILE "py.typed"
  VERBOSE
)

but this hits errors:

File "PATH/TO/PROJECT/build/cp311-cp311-manylinux_2_39_x86_64/_deps/nanobind-src/src/stubgen.py", line 593, in main
      file = Path(mod_imported.__file__)
                  ^^^^^^^^^^^^^^^^^^^^^
  AttributeError: module 'my_py_mod.my_nb_ext_mod.my_nb_ext_submod' has no attribute '__file__'. Did you mean: '__name__'?

@jdumas
Copy link

jdumas commented Mar 7, 2024

@jdumas You might be interested in seeing the approach for an upcoming version of the Dr.Jit project that is based on nanobind: https://drjit.readthedocs.io/en/nanobind_v2/reference.html. I've so far managed without stubs -- this is the associated Sphinx file with plenty use of autofunction, autoclass, etc.: https://github.com/mitsuba-renderer/drjit/blob/nanobind_v2/docs/reference.rst?plain=1

@wjakob just to follow-up on this: While individual calls to autofunction seem to properly generate the function signature with Sphinx autodoc, neither autoclass nor automodule will populate the content of said class/module. I see that in the DrJit documentation you are manually listing the module/classes content. I would like to have it generated automatically, which is something I'm able to achieve with the Sphinx AutoAPI extension. Currently the Lagrange Python documentation source is really minimal and produces the following website. The good news is that it seems to work just as well switching to the new nanobind-generated stubs, so I don't think there needs to be any additional change on nanobind's side to support this.

@laggykiller
Copy link
Contributor

@wjakob Two problems:

  1. typing.Annotated is not supported in python 3.8, it should not be used if stubgen in python 3.8
  2. Consider sorting imports in the generated pyi (See isort)?

Example:

from collections.abc import Sequence
from numpy.typing import ArrayLike
from typing import overload, Optional, Annotated

Should be:

from collections.abc import Sequence
from typing import Annotated, Optional, overload

from numpy.typing import ArrayLike

@wjakob
Copy link
Owner Author

wjakob commented Mar 9, 2024

@laggykiller : There is some sorting already in place. I am open to PRs that improve this further if it can be compactly (without adding a lot of complex code). It's not something I plan to work on.

@wjakob
Copy link
Owner Author

wjakob commented Mar 9, 2024

@CarlosBergillos @laggykiller @tmsrise I added a --recursive (short: -r) option to stubgen. Can you give it a try?

@wjakob
Copy link
Owner Author

wjakob commented Mar 9, 2024

@laggykiller generation of typing.Annotated is AFAIK just done to handle NDArrays. There is a separate discussion in #442 on how to improve typing of ndarrays, perhaps you can chime in there?

@laggykiller
Copy link
Contributor

laggykiller commented Mar 10, 2024

There is some sorting already in place. I am open to PRs that improve this further if it can be compactly

Opened a PR #462

I added a --recursive (short: -r) option to stubgen

I don't have a nanobind project that is complex enough to test this. However, I did open a PR #463 for allowing stubgen recursively from CMake using RECURSIVE, which could only be used with INSTALL_TIME

nanobind_add_stub(
    nanobind_example_ext_stubgen
    INSTALL_TIME
    MODULE nanobind_example.nanobind_example_ext
    OUTPUT_DIR ${SKBUILD_PLATLIB_DIR}/nanobind_example
    RECURSIVE
    MARKER_FILE "${SKBUILD_PLATLIB_DIR}/nanobind_example/py.typed"
    VERBOSE
)

There is a separate discussion in #442 on how to improve typing of ndarrays, perhaps you can chime in there?

Looks like the PR already handle Annotated well in python 3.8 by importing it from typing_extensions

@CarlosBergillos
Copy link

CarlosBergillos commented Mar 12, 2024

@CarlosBergillos @laggykiller @tmsrise I added a --recursive (short: -r) option to stubgen. Can you give it a try?

Nice! I'm very happy to see a --recursive option added. Thanks!

I've been testing it. The folder structure / names are different from what I would expect.
In a situation where I have myext, myext.sub1 and myext.sub2, I get the following files when using --recursive on myext:

├── myext.pyi
├── sub1/
│   └── __init__.pyi
└── sub2/
    └── __init__.pyi

here myext and sub1 are at the same level, which is incorrect.

I would expect to get (a):

└── myext/
    ├── __init__.pyi
    ├── sub1/
    │   └── __init__.pyi
    └── sub2/
        └── __init__.pyi

or even better (b):

└── myext/
    ├── __init__.pyi
    ├── sub1.pyi
    └── sub2.pyi

To implement (b) we need to know in advance if a module has submodules or not (if it does, then create a new folder + __init__.py, otherwise just write name.pyi).

For some inspiration mypy.stubgen first collects a list of all modules recursively in a separate collect_build_targets function and then iterates over them checking if any submodules are also in the list: https://github.com/python/mypy/blob/master/mypy/stubgen.py#L1703-L1707

But I'm also fine with (a) if it's easier. Afailk (a) and (b) are functionally equivalent.


Another thing that is currently not working properly in nanobind.stubgen that I would like to see is: generating stubs recursively for a module that is itself a submodule of something larger.
E.g. python -m nanobind.stubgen --module myapp.myext --recursive
where myapp is a folder of an existing Python project (not a C+++ extension). And we place our myext.so inside myapp/ during installation. In this case __name__ is "myapp.myext" and after .with_suffix(".pyi") nanobind.stubgen creates a myapp/myext/myapp.pyi file which is not correct (should be myapp/myext.pyi or myapp/myext/__init__.pyi).

@mrexodia
Copy link

mrexodia commented Mar 25, 2024

For my project I added a nanobind_add_typed_module helper function that calls both nanobind_add_module and nanobind_add_stub:

function(nanobind_add_typed_module tgt)
    nanobind_add_module(${tgt} ${ARGN})
    nanobind_add_stub(
        ${tgt}_stub
        MODULE ${tgt}
        OUTPUT ${tgt}.pyi
        PYTHON_PATH $<TARGET_FILE_DIR:${tgt}>
        MARKER_FILE py.typed
        DEPENDS ${tgt}
    )
endfunction()

This allows for a 'clean' cmkr cmake.toml project:

[cmake]
version = "3.15"
cmkr-include = "cmake/cmkr.cmake"

[project]
name = "xxx_native"

[find-package.Python]
version = "3.8"
components = ["Interpreter", "Development.Module"]

[fetch-content.nanobind]
git = "https://github.com/wjakob/nanobind"
tag = "af57451"
include-after = ["cmake/nanobind.cmake"]

[template.nanobind_module]
type = "shared"
add-function = "nanobind_add_typed_module"
pass-sources = true

[target.xxx_native]
type = "nanobind_module"
sources = ["src/xxx_native.cpp"]
compile-features = ["cxx_std_17"]

My use case is just a simple python extension in my xxx project. I also have xxx_native/__init__.py:

from .build.xxx_native import *

Which allows for a clean development experience and proper types. You just have to not use a multi-config generator and use cmake -B build to configure the project for everything to work out of the box.

Thanks for making this project by the way, it made everything relatively painless!

@tmsrise
Copy link

tmsrise commented Mar 27, 2024

Took a look at the most recent master. Seems like commit df8996a adds spec = importlib.util.find_spec(module) which is raising ValueError: my_module.submodule.__spec__ is None in my project. I have no idea what this line does nor why my project would have an issue with it.

@laggykiller

@laggykiller
Copy link
Contributor

laggykiller commented Mar 28, 2024

@tmsrise I can catch ValueError in there, but may I get a list of modules you are using in that project?

Even better, try to run importlib.util.find_spec(module) on all 3rd party modules you are using (e.g. importlib.util.find_spec("numpy")) and see if any would raise ValueError?

Edit: Done some testing, looks like all stdlib has __spec__ defined. I just need to confirm that it is 3rd party package which is missing __spec__ instead of nanobind package.

@wjakob
Copy link
Owner Author

wjakob commented Mar 28, 2024

@tmsrise I committed @laggykiller's patch, does that fix your issue?

@tmsrise
Copy link

tmsrise commented Apr 3, 2024

Sorry for the delay. It fixed the issue, thanks!

@laggykiller I'm not using any external python modules though. The my_module.submodule was just a sanitized version for all the submodules I create for my namespaced classes.

@laggykiller
Copy link
Contributor

laggykiller commented Apr 4, 2024

@tmsrise does the error occurs when using recursive (-r) mode during stubgen? If yes, does the submodules use any external python modules?

Can you help me find the culprit? Temporarily edit stubgen.py:

nanobind/src/stubgen.py

Lines 1114 to 1117 in f52877c

try:
spec = importlib.util.find_spec(module)
except (ModuleNotFoundError, ValueError):
return 1

To this:

        try:
            spec = importlib.util.find_spec(module)
        except ModuleNotFoundError:
            return 1
        except ValueError as e:
            raise ValueError(module + str(e))

See the error message. Your help is greatly appreciated.

@tmsrise
Copy link

tmsrise commented Apr 8, 2024

@laggykiller

        try:
            spec = importlib.util.find_spec(module)
        except ModuleNotFoundError:
            return 1
        except ValueError as e:
            raise ValueError(module + ", " + str(e))

ValueError: my_sanitized_module.BASE, my_sanitized_module.BASE.__spec__ is None

Printing instead of raising error shows the error for this BASE submodule multiple times as the other submodules are being generated.

@wjakob

This is my current CMake snippet. If there's any mistakes that would cause this or if there's a way to make this less janky, let me know.

set(PACKAGE_DIR ${CMAKE_CURRENT_BINARY_DIR}/my_sanitized_module)
file(MAKE_DIRECTORY ${PACKAGE_DIR})
find_package(Python 3.8 COMPONENTS Interpreter Development Development.Module REQUIRED)

# my_sanitized_module will be the python module name 
nanobind_add_module(my_sanitized_module NB_STATIC src/py_bindings.cpp)

# Link the my_sanitized_module module with our real library
target_include_directories(my_sanitized_module PUBLIC ${CMAKE_SOURCE_DIR}/include/MySanitizedCppLib)
target_link_libraries(my_sanitized_module PUBLIC MySanitizedCppLib)

# Add the various binding source files for each submodule into the nanobind target
add_subdirectory(src)

# Create the stubs
add_custom_target(GenerateStubs ALL
                    WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
                    COMMAND python3 ${CMAKE_BINARY_DIR}/_deps/nanobind-src/src/stubgen.py -r -m my_sanitized_module -O ${PACKAGE_DIR} -M ${PACKAGE_DIR}/py.typed
                    DEPENDS my_sanitized_module
)
# Rename the top level pyi to __init__.pyi so it works
add_custom_target(
    RenameFile ALL
    COMMAND ${CMAKE_COMMAND} -E rename ${PACKAGE_DIR}/my_sanitized_module.pyi ${PACKAGE_DIR}/__init__.pyi
    DEPENDS GenerateStubs
    )


# Create an init file in our build directory, gets installed with everything else later. This __init__ file is crucial for python being able to read the module. If it doesn't exist or is empty it will not work.
FILE(WRITE ${CMAKE_CURRENT_BINARY_DIR}/__init__.py
    "from .my_sanitized_module import *")


# Install the package directory we created, which holds all of our stubs
install(DIRECTORY ${PACKAGE_DIR}
        DESTINATION ${Python_SITEARCH})

# Install our __init__.py file
install(FILES ${CMAKE_CURRENT_BINARY_DIR}/__init__.py
        DESTINATION ${Python_SITEARCH}/my_sanitized_module)
        
# Now we're installing the actual module binary to that same package directory
install(TARGETS my_sanitized_module 
        COMPONENT python
        LIBRARY DESTINATION ${Python_SITEARCH}/my_sanitized_module)

@AndrewNolte
Copy link

AndrewNolte commented Apr 26, 2024

Re: stub staleness, I'm also trying to deal with this. I think it would make sense for hash(module) to always return the same thing given the contents of the nb module? Generated stubs can be marked with this so just one file needs to be checked to see if they need to be updated. Right now hash(module) returns a different number for each process

@kj4tmp
Copy link

kj4tmp commented May 13, 2024

seeing an odd error where stub generation fails with an import error, but only on windows and only in github ci (I am unable to repeat it on my local windows machine)

I am using:

nanobind_add_module(
  soem_ext
  STABLE_ABI
  NB_STATIC
  pyecm/soem_ext.cpp
)
nanobind_add_stub(
  soem_ext_stub
  MODULE soem_ext
  OUTPUT "${CMAKE_SOURCE_DIR}/pyecm/soem/soem_ext.pyi"
  PYTHON_PATH $<TARGET_FILE_DIR:soem_ext>
  DEPENDS soem_ext
  MARKER_FILE py.typed
  VERBOSE
)
# Install directive for scikit-build-core

install(TARGETS soem_ext LIBRARY DESTINATION "pyecm/soem")
install(FILES "${CMAKE_SOURCE_DIR}/pyecm/soem/soem_ext.pyi" DESTINATION "pyecm/soem")
install(FILES "${CMAKE_CURRENT_BINARY_DIR}/py.typed" DESTINATION "pyecm/soem")

But I get import error on stub generation (again only on windows and only in github CI).

Building Custom Rule D:/a/pyecm/pyecm/CMakeLists.txt
      soem_ext.cpp
         Creating library D:/a/pyecm/pyecm/build/cp310-cp310-win_amd64/Release/soem_ext.lib and object D:/a/pyecm/pyecm/build/cp310-cp310-win_amd64/Release/soem_ext.exp
      soem_ext.vcxproj -> D:\a\pyecm\pyecm\build\cp310-cp310-win_amd64\Release\soem_ext.cp310-win_amd64.pyd
      Generating py.typed, D:/a/pyecm/pyecm/pyecm/soem/soem_ext.pyi
      Module "soem_ext" ..
        - importing ..
      Traceback (most recent call last):
        File "D:\a\pyecm\pyecm\ext\nanobind\src\stubgen.py", line 1391, in <module>
          main()
        File "D:\a\pyecm\pyecm\ext\nanobind\src\stubgen.py", line 1324, in main
          mod_imported = importlib.import_module(mod)
        File "C:\Users\runneradmin\AppData\Local\pypa\cibuildwheel\Cache\nuget-cpython\python.3.10.11\tools\lib\importlib\__init__.py", line 126, in import_module
          return _bootstrap._gcd_import(name[level:], package, level)
        File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
        File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
        File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
        File "<frozen importlib._bootstrap>", line 674, in _load_unlocked
        File "<frozen importlib._bootstrap>", line 571, in module_from_spec
        File "<frozen importlib._bootstrap_external>", line 1176, in create_module
        File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
      ImportError: DLL load failed while importing soem_ext: The specified module could not be found.

I made a discussion post: #579
and here is my github action run: https://github.com/kj4tmp/pyecm/actions/runs/9049680748

@Seairth
Copy link
Contributor

Seairth commented May 13, 2024

When generating stubs for Enum/IntEnum, I'm getting results like:

class MyEnum(enum.IntEnum):
    A: MyEnum
    B: MyEnum
    # etc...

What I was expecting was:

class MyEnum(enum.IntEnum):
    A = 0
    B = 1
    # etc...

It appears I can get what I'm expecting by changing line 559 from:

self.write_ln(f"{name}: {self.type_str(tp)}")

to

self.write_ln(f"{name} = {typing.cast(enum.Enum, value).value}")

Note that the cast is just to make pylance/pyright happy. There may be a less verbose way of doing this.

@kj4tmp
Copy link

kj4tmp commented May 14, 2024

my issue was actually related to a missing dll dependency of my extension module and not related to nanobind. I found the dependency using https://github.com/lucasg/Dependencies and installed it in my build environment.

details #579

@wjakob
Copy link
Owner Author

wjakob commented May 23, 2024

Now that nanobind 2.0.0 is out, I will close this tracking issue (which has gotten very long). Let's track remaining issues with stub generation using separate tickets.

@wjakob wjakob closed this as completed May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.