RFC: Modularizing PackageCompiler.jl #858

sloede · 2023-10-05T11:12:10Z

PackageCompiler.jl is a great tool and IMHO a vital part of the journey towards making Julia more universally deployable. At the moment, there are three main entry points,

create_sysimage (18 args/kwargs)
create_app (15 args/kwargs)
create_library (19 args/kwargs)

serving the three main purposes of creating sysimages for reduced latency, standalone apps that can be deployed without a Julia installation, and standalone libraries (also independently deployable).

Over time, these three functions have tremendously grown in capabilities, which is reflected by the huge number of arguments they take. Besides being somewhat unwieldy and not overly "Julian", it also means that it is hard to integrate PackageCompiler.jl builds into more complex build workflows that use, e.g., CMake.

I've been pondering this for a while now, and I believe there might be a solution to this: By decomposing these three main functions into individual, independent parts, using Julia's type system, we could make the individual steps of the build process more composable. This would allow users to make their builds more flexible and hopefully opening up some potential for caching intermediate results.

From an initial survey of the current code, I could imagine creating the following types, each representing one part of the build step (names TBD):

ObjectFile
Sysimage
- BaseSysimage
App
- Executable
Library

The idea would be that for, e.g., a library, I would

call base_sysimage = build_base_sysimage(...) to create a BaseSysimage object
call sysimage_obj_file = build_sysimage_object_file(base_sysimage, ...) to create a corresponding ObjectFile
call obj_file = build_object_file("path/to/c/file", ...) for each external file
call sysimage = build_sysimage(sysimage_obj_file, obj_file, ...) to compile the sysimage
call library = create_library(sysimage, ...) to bundle all relevant info for creating a library
call install(prefix, library) to install the library

My goal is that with such a more modular approach, we can then go ahead and think about caching intermediate results. For example, if we hashed the arguments + Julia version to the current create_fresh_base_sysimage (which is essentially a list of strings), we could skip re-generating the base sysimage during each build. Similarly, it would allow me to not having to rebuild sysimage_obj_file if I just want to add or modify the C files with the initialization functions.

I am probably missing something (e.g., maybe we need a Config or Context object to pass information around that is needed in multiple places, such as project paths and cache directories), but hopefully this can serve as a starting point for a discussion on whether
a) such an approach is feasible,
b) it is desirable, and
c) ultimately whether there are maybe better ways to achieve the desired goals.

Comments/suggestions/hole poking welcome 🙂

The text was updated successfully, but these errors were encountered:

KristofferC · 2023-12-13T13:18:21Z

The code restructuring sounds like a good idea to me but I don't see how that helps with the fact that the entry points we have (create_XXX) have a lot of options. To me, this looks more like internal code refactoring, not something directly user facing?

sloede mentioned this issue Jan 15, 2024

Cache the "base" sysimage using Scratch.jl #916

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Modularizing PackageCompiler.jl #858

RFC: Modularizing PackageCompiler.jl #858

sloede commented Oct 5, 2023

KristofferC commented Dec 13, 2023