-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
macaw-symbolic
: Support simulating dynamic relocations
#326
Comments
Thank you for this superb writeup 👏
What is the purpose of this categorization? And it seems like this depends on the particular implementation of
It occurs to me that this is a parallel case to hitting an instruction with unsupported semantics - while it could be caught at translation time, Macaw defers the error to simulation time in order to support simulation of binaries with certain unsupported features. Perhaps we should have a uniform approach to such runtime error handling? Just a thought. In general, I'd lean towards even poor error handling (e.g., calling |
It's been a while since I originally wrote this comment, but I think I had One challenge that we ran into when implementing this code in As such, (OK, infodump over.) The categorization approach that
Yes, I agree that it is nice to have an error message whenever you encounter an unsupported relocation type at simulation time. As one possible design, |
If these different architecture-specific relocations need to be handled differently during symbolic execution and the current architecture-neutral abstraction over them doesn't provide enough information to do so, then perhaps they should just be exposed directly, rather than through this abstraction? This seems the most flexible approach for all clients. There could perhaps be a separate module that compiles the "raw" relocations into an architecture-neutral DSL like ambient-verifier's. For additional context, what is the point of the |
The primary purpose of Note that
The problem is that |
Having been down the rabbit hole of trying to abstract relocation types already once in the past, I'd advise against trying. The next architecture that comes along is probably going to bust up whatever abstraction you come up with. For example, some MIPS static relocations come in pairs... virtually every architecture has something insane somewhere in its relocation types. Depending on what the requirements are though it may be sufficient to just define every relocation with a symbolic expression; the hard part there is what the vocabulary of things to refer to needs to be. If all you're doing is evaluating relocations, that's probably enough. If you need to be able to generate them, it gets a lot worse. But I would expect that here we don't need to generate them. |
Quite true. In light of this, one perfectly acceptable stance for |
One particularly tricky part of dynamically linked binaries is their use of relocations. For example, this C program:
When compiled like so:
Will produce a binary with this
.data
section:Note the use of an
R_X86_64_RELATIVE
relocation in the definition of the globalS
array. This has implications formacaw-symbolic
, simulating the machine code forS
requires determining what address the relocation references.Currently,
macaw-symbolic
does not do much at all in its treatment of relocations:macaw/symbolic/src/Data/Macaw/Symbolic/Testing.hs
Lines 681 to 696 in 97c61e4
This will simply initialize each relocation region in a binary with symbolic bytes. This is sufficient to make dynamically linked binaries that do not directly access relocations work, but this approach fails on the example above, where a dynamic relocation is on the critical path. To make the example above work, we need a smarter implementation of
populateRelocation
.As a brief primer on relocations, each architecture's ABI has a list of relocation types, such as Table 4.10 in the System V ABI for x86-64. Each relocation type calculates an address in a unique way, which is listed in the Calculation column of Table 4.10. For example, the
R_X86_64_RELATIVE
relocation type usesB + A
, whereB
is the base address of the dynamically linked binary andA
is the relocation's addend. In the example above, theobjdump
output tells us that the addend is0x2004
:And sure enough, we can see the data for the string
"Hello!"
(represented as a sequence of ASCII bytes) starting at that address:macaw
has already done a lot of the work needed to represent relocations in an architecture-independent way. Aside from thepopulateRelocation
abstraction above,macaw
also has its own notion of aRelocation
data type here. TherelocationOffset
field is a value that can be added to a relocation's base address to compute the address that the relocation references. In the case ofR_X86_64_RELATIVE
, the value ofrelocationOffset
would be the addend value minus the base address. Other relocation types would compute theirrelocationOffset
is slightly different ways. The code thatmacaw
uses for interpreting different relocation types asRelocation
values can be found here.The missing piece that has not yet implemented yet (the topic of this particular issue) is to make
Data.Macaw.Symbolic.Testing
'spopulateRelocation
aware of dynamic relocations as well. For many relocation types, such asR_X86_64_RELATIVE
, the general process goes like this:relocationOffset
.SymBV
usingbvLit
and return this list.Life would be much simpler if all relocation types could be handled like this, but alas, life is not simple. I have found a couple of exceptions to the template above that require special treatment:
R_X86_64_64
andR_X86_64_GLOB_DAT
relocations). These relocations require knowing the addresses of a symbol that may be defined in the same binary (e.g.,R_X86_64_64
) or in a separate shared library (e.g.,R_X86_64_GLOB_DAT
).COPY
relocations (e.g.,R_X86_64_COPY
, see Support R_X86_64_COPY #47) are extra special. Unlike other relocations, which reference an address,COPY
relocations reference a value of a global variable defined in a shared library. As a result, populating aCOPY
relocation is fairly involved, as it requires determining the relevant values in the address space of a shared library.This is likely not an exhaustive list of special cases, but these are the ones that I am currently aware of.
Some unresolved questions:
We would need to categorize each relocation type by how it is handled in
populateRelocation
. Where is the best place to store this information? In a new field ofRelocation
? Somewhere else?There are a staggering number of relocation types to support, and it is doubtful that we will have full support for all of them any time soon. What should happen if you attempt to read from a relocation region that uses a not-yet-supported relocation type?
One approach is to just return symbolic bytes, as what is shown above. That could lead to surprising results, however, since
macaw-symbolic
won't outright crash if it tries to read from that region.Another approach is to explicitly track which relocation types are supported, and if
macaw-symbolic
's memory model ever encounters a read from an unsupported relocation type, throw an explicit error. This would eliminate the potential confusion above, but at the cost of having to plumb extra information to themacawExtensions
function, which initializes the memory model.We have implemented the "explicitly track which relocation types are supported" option in the downstream
ambient-verifier
project, which could be used as inspiration.The text was updated successfully, but these errors were encountered: