Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient way to version/checksum build output based on dependencies (inputs) #822

Open
saurabhnanda opened this issue Jan 29, 2022 · 4 comments

Comments

@saurabhnanda
Copy link

Context

My overall build pipeline consists of building five components. Each component has a different set of dependencies. If a component's dependencies don't change, that relevant rules are not run, and therefore, the overall build output is missing certain build artifacts.

However, in the release pipeline, I need to deploy all five components. Unfortunately, because build-steps for certain sub-components have been skipped, the build artifacts are not complete.

Question

Is there any efficient [1] way to determine the overall "checksum" of a rule? The idea is to put build artifacts in a directory that contains the checksum of the inputs/dependencies, eg. componentA-<input-checksum>.

Further, is there any way to run a function when a build-rule is skipped? That function can be used to emit a .checksum file as a build artifact in place of the actual build artifacts. The release pipeline can then work with actual artifacts (for components that were not skipped), or use the .checksum file to fetch a previous artifact from the artifact cache for components that were skipped during the build phase.

[1] I noticed https://hackage.haskell.org/package/shake-0.19/docs/Development-Shake.html#v:getHashedShakeVersion but it feels that I would be redoing the computations that Shake would've already performed (in order to determine which rules should be re-run and which shouldn't)

PS: I think I can hack this together with a phony along with a needHasChanged, but again, it seems inefficient and seems to be duplicating the work that would've already been done by Shake internally.

@saurabhnanda
Copy link
Author

A good way to understand the problem is to not think of the local development machine (where you are always interested in the latest build output and your working directory + prev build artifacts persist between builds/runs), but CI/CD pipelines where:

  • your working directory is created afresh between runs (you can still persist the shake DB between runs to be able to use Shake's change detection features)
  • you have to take special measures to persist build artifacts
  • you might be interested in build artifacts from previous runs

@ndmitchell
Copy link
Owner

How is your build structured? How are you skipping entirely components that are not required? Is that by consulting a cache? Or does Shake somehow skip creating the components? Usually a build would say want ["component1", "component2", ...] and that would require those to be produced by the end result. Have you used things like the Shake shared cache? Do you take/restore some partial output from a previous run, like databases/outputs?

@ndmitchell
Copy link
Owner

To answer the more general question, there is no way to get a checksum of a rule and its dependencies in Shake. The only build system I'm aware that works that way is Buck (which I wouldn't recommend for other reasons).

@saurabhnanda
Copy link
Author

I'm back at solving this problem before 2022 ends 😄

@ndmitchell doesn't shake maintain an internal log of checksums of all dependencies? Is there any way to get to it using some internal API?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants