RFC: crucible-llvm: Parameterize over memory and pointer types #1194

langston-barrett · 2024-04-02T18:49:54Z

Note that the code changes are very WIP. Further notes below.

The idea is that we have a lot of big changes we'd potentially like to
make to the memory model:

Decouple LLVM memories from control flow #917: Decouple memory from control flow
Use something besides Crucible type Nat for LLVM pointer block IDs #404: Improve underlying symbolic type for pointer representation
LLVM memory model code reorganization #399: Refactoring the memory model
Bit-level undef/poison #366: Bit-level undef/poison

The problem is that these are all very hard to get started on, because
they would involve sweeping changes across Crucible-LLVM and all
downstream packages.

We could also consider alternative memory models for different
use-cases:

A "flat" model based on a single SMT array with a bump allocator, to
let the SMT solver handle more of the hairy details
A "lazy" model in the style of UC-KLEE for under-constrained symbolic
execution
A "fresh" model that always returns fresh symbolic constants when read

All of these would have interesting trade-offs, but can't effectively be
explored at the moment.

This PR is the first step in making it easier to run such
experiments. The goal is to parameterize Crucible-LLVM over a typeclass
that provides (Crucible and Haskell) types and operations on memory and
pointers.

To that end, it introduces a new mem type variable. At the moment, a
lot of places in the code add the abstraction-breaking mem ~ Mem
constraint. The goal would be to gradually add operations to the Mem
class and remove such equality constraints.

Some notable downsides:

Increased complexity. See e.g., the pattern matches in
Translation.Instruction, which I couldn't figure out how to
adequately replace. Counterpoint: programming against an well-defined
interface (i.e., a typeclass) can reduce mental complexity by
abstracting superfluous detail.
Code churn within Crucible
API churn for consumers

Notes on progress:

This PR introduces the mem type variable and Mem typeclass.
This PR far enough to compile, I haven't run the tests but there should be no behvioral changes (modulo one explicit TODO item).
Translation is actually parametric over Mem, but evaluation just forces mem to be explicitly equal to the current memory model. More operations would be needed inside the Mem typeclass to make evaluation more parametric.
Obviously, lots of formatting is seriously messed up. If we like this direction, I can clean that up.

The idea is that we have a lot of big changes we'd potentially like to make to the memory model: - Issue no. 917: Decouple memory from control flow - Issue no. 404: Improve underlying symbolic type for pointer representation - Issue no. 399: Refactoring the memory model - Issue no. 366: Bit-level undef/poison The problem is that these are all very hard to get started on, because they would involve sweeping changes across Crucible-LLVM and all downstream packages. We could also consider alternative memory models for different use-cases: - A "flat" model based on a single SMT array with a bump allocator, to let the SMT solver handle more of the hairy details - A "lazy" model in the style of UC-KLEE for under-constrained symbolic execution - A "fresh" model that always returns fresh symbolic constants when read All of these would have interesting trade-offs, but can't effectively be explored at the moment. This commit is the first step in making it easier to run such experiments. The goal is to parameterize Crucible-LLVM over a typeclass that provides (Crucible and Haskell) types and operations on memory and pointers. To that end, it introduces a new `mem` type variable. At the moment, a lot of places in the code add the abstraction-breaking `mem ~ Mem` constraint. The goal would be to gradually add operations to the `Mem` class and remove such equality constraints. Some notable downsides: - Increased complexity. See e.g., the pattern matches in `Translation.Instruction`, which I couldn't figure out how to adequately replace. Counterpoint: programming against an well-defined interface (i.e., a typeclass) can reduce mental complexity by abstracting superfluous detail. - Code churn within Crucible - API churn for consumers Mostly done with: mogglo-haskell \ '${{ focus:kind() == "type" and not string.find(focus:parent():parent():text(), "mem") and focus:text() == "LLVM" }}' \ --replace '${{ "(LLVM mem)" }}'\ --confirm path/to/file.hs

Evaluation just assumes `mem ~ Mem`, but translation doesn't need very many operations.

langston-barrett · 2024-04-04T17:09:16Z

@RyanGlScott Could you take a look and see if you like the direction this is headed? This still needs:

Formatting changes
Changes to downstream libraries

But at that point, I think it will be a self-contained first step towards parameterizing the memory model, the next steps being removing the mem ~ Mem constraints in favor of adding more operations to the Mem typeclass.

RyanGlScott · 2024-04-04T21:52:27Z

Some initial thoughts:

There is also a mem type variable used in macaw-symbolic (here), which is typically instantiated with LLVMMemory. I'm not quite sure if the mem used here (which will typically be instantiated with the similarly-named Mem) is the same notion of mem used in macaw-symbolic... we may want to think about how to reconcile these two.
This is a big change—probably the biggest since the great sym/bak split a while ago. As such, I think we'd want to at least have draft PRs for the other big MTV libraries (Macaw, SAW, etc.) ready before we think about adopting this.
Before we commit to this, I think it would be helpful to come up with a proof-of-concept test that swaps out the existing LLVM memory model with an alternative one. It's easy to forget to update something and not realize it until you actually try using it in practice.

langston-barrett added 3 commits April 2, 2024 14:39

crucible-llvm: Memory class, more parameterization over memory

7a730c2

crucible-llvm: Parameterize translation over pointer type

5261098

Evaluation just assumes `mem ~ Mem`, but translation doesn't need very many operations.

langston-barrett added llvm discussion labels Apr 2, 2024

langston-barrett requested a review from RyanGlScott April 2, 2024 18:49

langston-barrett self-assigned this Apr 2, 2024

langston-barrett added 2 commits April 2, 2024 14:55

crucible-llvm: Cleanup, Haddocks about memory models

5aa5b16

crucible-llvm: Parameterize poison, UB over memory/pointer types

66373cb

WIP: fix formatting

a28a867

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: crucible-llvm: Parameterize over memory and pointer types #1194

RFC: crucible-llvm: Parameterize over memory and pointer types #1194

langston-barrett commented Apr 2, 2024

langston-barrett commented Apr 4, 2024

RyanGlScott commented Apr 4, 2024

RFC: crucible-llvm: Parameterize over memory and pointer types #1194

Are you sure you want to change the base?

RFC: crucible-llvm: Parameterize over memory and pointer types #1194

Conversation

langston-barrett commented Apr 2, 2024

langston-barrett commented Apr 4, 2024

RyanGlScott commented Apr 4, 2024