Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macaw-symbolic: Support simulating dynamic relocations #326

Open
RyanGlScott opened this issue Mar 7, 2023 · 6 comments
Open

macaw-symbolic: Support simulating dynamic relocations #326

RyanGlScott opened this issue Mar 7, 2023 · 6 comments
Labels
enhancement symbolic-execution Issues relating to macaw-symbolic and symbolic execution

Comments

@RyanGlScott
Copy link
Contributor

RyanGlScott commented Mar 7, 2023

One particularly tricky part of dynamically linked binaries is their use of relocations. For example, this C program:

#include <stddef.h>
#include <string.h>

const char *S = "Hello!";

int main(void) {
  size_t l = strnlen(S, 10);
  return 0;
}

When compiled like so:

$ gcc test.c -o test.exe

Will produce a binary with this .data section:

$ objdump -dR -j .data test.exe

test.exe:     file format elf64-x86-64


Disassembly of section .data:

0000000000004000 <__data_start>:
        ...

0000000000004008 <__dso_handle>:
    4008:       08 40 00 00 00 00 00 00                             .@......
                        4008: R_X86_64_RELATIVE *ABS*+0x4008

0000000000004010 <S>:
    4010:       04 20 00 00 00 00 00 00                             . ......
                        4010: R_X86_64_RELATIVE *ABS*+0x2004

Note the use of an R_X86_64_RELATIVE relocation in the definition of the global S array. This has implications for macaw-symbolic, simulating the machine code for S requires determining what address the relocation references.

Currently, macaw-symbolic does not do much at all in its treatment of relocations:

-- | The test harness currently treats relocations as entirely symbolic data.
-- Most test cases will be unaffected by this, provided that they do not use
-- relocations in the test functions themselves.
-- See @Note [Shared libraries] (Dynamic relocations)@.
populateRelocation :: MSM.GlobalMemoryHooks w
populateRelocation = MSM.GlobalMemoryHooks
{ MSM.populateRelocation = \bak _ _ _ reloc ->
symbolicRelocation (CB.backendGetSym bak) reloc
}
where
symbolicRelocation sym reloc =
mapM (symbolicByte sym "reloc") [1 .. MM.relocationSize reloc]
symbolicByte sym name idx = do
let symbol = WI.safeSymbol $ name ++ "-byte" ++ show (idx-1)
WI.freshConstant sym symbol WI.knownRepr

This will simply initialize each relocation region in a binary with symbolic bytes. This is sufficient to make dynamically linked binaries that do not directly access relocations work, but this approach fails on the example above, where a dynamic relocation is on the critical path. To make the example above work, we need a smarter implementation of populateRelocation.

As a brief primer on relocations, each architecture's ABI has a list of relocation types, such as Table 4.10 in the System V ABI for x86-64. Each relocation type calculates an address in a unique way, which is listed in the Calculation column of Table 4.10. For example, the R_X86_64_RELATIVE relocation type uses B + A, where B is the base address of the dynamically linked binary and A is the relocation's addend. In the example above, the objdump output tells us that the addend is 0x2004:

                        4010: R_X86_64_RELATIVE *ABS*+0x2004

And sure enough, we can see the data for the string "Hello!" (represented as a sequence of ASCII bytes) starting at that address:

$ objdump -D -j .rodata --start-address 0x2004 test.exe

test.exe:     file format elf64-x86-64


Disassembly of section .rodata:

0000000000002004 <_IO_stdin_used+0x4>:
    2004:       48                      rex.W
    2005:       65 6c                   gs insb (%dx),%es:(%rdi)
    2007:       6c                      insb   (%dx),%es:(%rdi)
    2008:       6f                      outsl  %ds:(%rsi),(%dx)
    2009:       21 00                   and    %eax,(%rax)

macaw has already done a lot of the work needed to represent relocations in an architecture-independent way. Aside from the populateRelocation abstraction above, macaw also has its own notion of a Relocation data type here. The relocationOffset field is a value that can be added to a relocation's base address to compute the address that the relocation references. In the case of R_X86_64_RELATIVE, the value of relocationOffset would be the addend value minus the base address. Other relocation types would compute their relocationOffset is slightly different ways. The code that macaw uses for interpreting different relocation types as Relocation values can be found here.

The missing piece that has not yet implemented yet (the topic of this particular issue) is to make Data.Macaw.Symbolic.Testing's populateRelocation aware of dynamic relocations as well. For many relocation types, such as R_X86_64_RELATIVE, the general process goes like this:

  1. Compute the relocation's base address plus the relocationOffset.
  2. Convert this address into a little-endian sequence of bytes.
  3. Convert each byte to a SymBV using bvLit and return this list.

Life would be much simpler if all relocation types could be handled like this, but alas, life is not simple. I have found a couple of exceptions to the template above that require special treatment:

  • Some relocation types reference the address of a symbol plus an offset (e.g., R_X86_64_64 and R_X86_64_GLOB_DAT relocations). These relocations require knowing the addresses of a symbol that may be defined in the same binary (e.g., R_X86_64_64) or in a separate shared library (e.g., R_X86_64_GLOB_DAT).
  • COPY relocations (e.g., R_X86_64_COPY, see Support R_X86_64_COPY #47) are extra special. Unlike other relocations, which reference an address, COPY relocations reference a value of a global variable defined in a shared library. As a result, populating a COPY relocation is fairly involved, as it requires determining the relevant values in the address space of a shared library.

This is likely not an exhaustive list of special cases, but these are the ones that I am currently aware of.

Some unresolved questions:

  1. We would need to categorize each relocation type by how it is handled in populateRelocation. Where is the best place to store this information? In a new field of Relocation? Somewhere else?

  2. There are a staggering number of relocation types to support, and it is doubtful that we will have full support for all of them any time soon. What should happen if you attempt to read from a relocation region that uses a not-yet-supported relocation type?

    One approach is to just return symbolic bytes, as what is shown above. That could lead to surprising results, however, since macaw-symbolic won't outright crash if it tries to read from that region.

    Another approach is to explicitly track which relocation types are supported, and if macaw-symbolic's memory model ever encounters a read from an unsupported relocation type, throw an explicit error. This would eliminate the potential confusion above, but at the cost of having to plumb extra information to the macawExtensions function, which initializes the memory model.

    We have implemented the "explicitly track which relocation types are supported" option in the downstream ambient-verifier project, which could be used as inspiration.

@RyanGlScott RyanGlScott added enhancement symbolic-execution Issues relating to macaw-symbolic and symbolic execution labels Mar 7, 2023
@langston-barrett
Copy link
Contributor

Thank you for this superb writeup 👏

We would need to categorize each relocation type by how it is handled in populateRelocation. Where is the best place to store this information? In a new field of Relocation? Somewhere else?

What is the purpose of this categorization? And it seems like this depends on the particular implementation of populateRelocation, but that function is configurable by clients?

There are a staggering number of relocation types to support, and it is doubtful that we will have full support for all of them any time soon. What should happen if you attempt to read from a relocation region that uses a not-yet-supported relocation type?

It occurs to me that this is a parallel case to hitting an instruction with unsupported semantics - while it could be caught at translation time, Macaw defers the error to simulation time in order to support simulation of binaries with certain unsupported features. Perhaps we should have a uniform approach to such runtime error handling? Just a thought.

In general, I'd lean towards even poor error handling (e.g., calling error when simulating unsupported relocations) over returning symbolic bytes due to the surprise factor you mention. If clients can indeed customize populateRelocation, then they can manually implement a version that works as usual for supported types but returns symbolic bytes in other cases that they want it to.

@RyanGlScott
Copy link
Contributor Author

What is the purpose of this categorization? And it seems like this depends on the particular implementation of populateRelocation, but that function is configurable by clients?

It's been a while since I originally wrote this comment, but I think I had ambient-verifier. ambient-verifier's implementation of populateRelocation can be found here. (This will take a bit of exposition to motivate, so bear with me for a bit...)

One challenge that we ran into when implementing this code in ambient-verifier is that macaw's Relocation data type does a good job of normalizing different relocation types into a single representation, but it might do too good of a job. This is because different relocation types need to be handled differently at the level of populateRelocation level, e.g., R_ARM_RELATIVE relocations need to be handled differently from R_ARM_COPY relocations, which in turn need to be handled differently from R_ARM_GLOB_DAT relocations, etc. However, the Relocation data type doesn't distinguish among these different relocation types, making it impossible to implement populateRelocation with just the Relocation API alone. (There is a relocationJumpSlot field which tells you if you have an R_ARM_JUMP_SLOT relocation or not, but it feels like we'd need more information than this in general to have full coverage of all the relocation types.)

As such, ambient-verifier implements its own, separate categorization of relocation types, which is implemented here. ambient-verifier currently categorized all relocation types into one of three buckets. Some buckets are specific to one relocation type (e.g., R_ARM_COPY is the only CopyReloc), whereas some buckets encompass multiple relocations types (e.g., both R_ARM_ABS32 and R_ARM_GLOB_DAT are considered SymbolRelocs). These three buckets correspond to the three different cases in populateReloc.

(OK, infodump over.)

The categorization approach that ambient-verifier uses is one possible way to go about implementing populateRelocation at the macaw-symbolic (and indeed, it's the only way I currently know). Would this be suitable for all clients? I'm not sure—perhaps there is another application for which this level of categorization is too fine- or coarse-grained. It might be nice to design the API in a way that makes is relatively straightforward for people to drop in their own, custom categorizations.

It occurs to me that this is a parallel case to hitting an instruction with unsupported semantics - while it could be caught at translation time, Macaw defers the error to simulation time in order to support simulation of binaries with certain unsupported features. Perhaps we should have a uniform approach to such runtime error handling? Just a thought.

In general, I'd lean towards even poor error handling (e.g., calling error when simulating unsupported relocations) over returning symbolic bytes due to the surprise factor you mention. If clients can indeed customize populateRelocation, then they can manually implement a version that works as usual for supported types but returns symbolic bytes in other cases that they want it to.

Yes, I agree that it is nice to have an error message whenever you encounter an unsupported relocation type at simulation time. As one possible design, ambient-verifier throws such an error message here. The downside to this approach is that you have to thread around a collection of unsupported relocations down to the level of each use of MacawReadMem, which is a bit heavyweight.

@langston-barrett
Copy link
Contributor

One challenge that we ran into when implementing this code in ambient-verifier is that macaw's Relocation data type does a good job of normalizing different relocation types into a single representation, but it might do too good of a job. This is because different relocation types need to be handled differently at the level of populateRelocation level, e.g., R_ARM_RELATIVE relocations need to be handled differently from R_ARM_COPY relocations, which in turn need to be handled differently from R_ARM_GLOB_DAT relocations, etc. However, the Relocation data type doesn't distinguish among these different relocation types, making it impossible to implement populateRelocation with just the Relocation API alone. (There is a relocationJumpSlot field which tells you if you have an R_ARM_JUMP_SLOT relocation or not, but it feels like we'd need more information than this in general to have full coverage of all the relocation types.)

If these different architecture-specific relocations need to be handled differently during symbolic execution and the current architecture-neutral abstraction over them doesn't provide enough information to do so, then perhaps they should just be exposed directly, rather than through this abstraction? This seems the most flexible approach for all clients. There could perhaps be a separate module that compiles the "raw" relocations into an architecture-neutral DSL like ambient-verifier's.

For additional context, what is the point of the Relocation type? Was it intended to support this use-case, or does it have other uses?

@RyanGlScott
Copy link
Contributor Author

what is the point of the Relocation type? Was it intended to support this use-case, or does it have other uses?

The primary purpose of Relocations are to help identify relocatable regions of memory, which are treated separately than other types of MemChunks. (See here for one such example of special treatment.)

Note that MemChunk is parameterized solely by the word size, so it is agnostic to any details pertaining to the architecture or the binary container format. As such, Relocation doesn't contain any architecture- or container-specific details either.

If these different architecture-specific relocations need to be handled differently during symbolic execution and the current architecture-neutral abstraction over them doesn't provide enough information to do so, then perhaps they should just be exposed directly, rather than through this abstraction?

The problem is that Relocation is architecture-agnostic, so we can't directly put architecture-specific things like ARM32_RelocationTypes or X86_64_RelocationTypes into a Relocation—at least, not without a considerable amount of refactoring. ambient-verifier's RelocType was one attempt at coming up with an architecture-agnostic relocation data type, although I'm not clear if it's the right design.

@sauclovian-g
Copy link

Having been down the rabbit hole of trying to abstract relocation types already once in the past, I'd advise against trying. The next architecture that comes along is probably going to bust up whatever abstraction you come up with. For example, some MIPS static relocations come in pairs... virtually every architecture has something insane somewhere in its relocation types.

Depending on what the requirements are though it may be sufficient to just define every relocation with a symbolic expression; the hard part there is what the vocabulary of things to refer to needs to be. If all you're doing is evaluating relocations, that's probably enough. If you need to be able to generate them, it gets a lot worse. But I would expect that here we don't need to generate them.

@RyanGlScott
Copy link
Contributor Author

The next architecture that comes along is probably going to bust up whatever abstraction you come up with.

Quite true. ambient-verifier's RelocType is almost assuredly over-fitted to what AArch32 and x86-64 are doing, and it's unclear if it would generalize to other architectures (I haven't tried).

In light of this, one perfectly acceptable stance for macaw-symbolic to take is that it shouldn't even attempt to handle relocations. macaw-symbolic can offer generic, catch-all approaches (e.g., one that always errors when a relocation is encountered, and another that fills in every relocation region with symbolic bytes), but for anything else, you need to come up with a custom solution that is fitted to your particular application's needs. This is perhaps a bit unsatisfying from the perspective of making things work out of the box, but it does allow us to avoid the thorny question of how to make a design that works for all macaw-symbolic users (which may well be infeasible).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement symbolic-execution Issues relating to macaw-symbolic and symbolic execution
Projects
None yet
Development

No branches or pull requests

3 participants