Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Modularizing PackageCompiler.jl #858

Open
sloede opened this issue Oct 5, 2023 · 1 comment
Open

RFC: Modularizing PackageCompiler.jl #858

sloede opened this issue Oct 5, 2023 · 1 comment

Comments

@sloede
Copy link
Collaborator

sloede commented Oct 5, 2023

PackageCompiler.jl is a great tool and IMHO a vital part of the journey towards making Julia more universally deployable. At the moment, there are three main entry points,

serving the three main purposes of creating sysimages for reduced latency, standalone apps that can be deployed without a Julia installation, and standalone libraries (also independently deployable).

Over time, these three functions have tremendously grown in capabilities, which is reflected by the huge number of arguments they take. Besides being somewhat unwieldy and not overly "Julian", it also means that it is hard to integrate PackageCompiler.jl builds into more complex build workflows that use, e.g., CMake.

I've been pondering this for a while now, and I believe there might be a solution to this: By decomposing these three main functions into individual, independent parts, using Julia's type system, we could make the individual steps of the build process more composable. This would allow users to make their builds more flexible and hopefully opening up some potential for caching intermediate results.

From an initial survey of the current code, I could imagine creating the following types, each representing one part of the build step (names TBD):

  • ObjectFile
  • Sysimage
    • BaseSysimage
  • App
    • Executable
  • Library

The idea would be that for, e.g., a library, I would

  • call base_sysimage = build_base_sysimage(...) to create a BaseSysimage object
  • call sysimage_obj_file = build_sysimage_object_file(base_sysimage, ...) to create a corresponding ObjectFile
  • call obj_file = build_object_file("path/to/c/file", ...) for each external file
  • call sysimage = build_sysimage(sysimage_obj_file, obj_file, ...) to compile the sysimage
  • call library = create_library(sysimage, ...) to bundle all relevant info for creating a library
  • call install(prefix, library) to install the library

My goal is that with such a more modular approach, we can then go ahead and think about caching intermediate results. For example, if we hashed the arguments + Julia version to the current create_fresh_base_sysimage (which is essentially a list of strings), we could skip re-generating the base sysimage during each build. Similarly, it would allow me to not having to rebuild sysimage_obj_file if I just want to add or modify the C files with the initialization functions.

I am probably missing something (e.g., maybe we need a Config or Context object to pass information around that is needed in multiple places, such as project paths and cache directories), but hopefully this can serve as a starting point for a discussion on whether
a) such an approach is feasible,
b) it is desirable, and
c) ultimately whether there are maybe better ways to achieve the desired goals.

Comments/suggestions/hole poking welcome 🙂

@KristofferC
Copy link
Sponsor Member

The code restructuring sounds like a good idea to me but I don't see how that helps with the fact that the entry points we have (create_XXX) have a lot of options. To me, this looks more like internal code refactoring, not something directly user facing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants