Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crucible-llvm: Support string tables with Clang 14.0.0 + optimizations #1174

Open
RyanGlScott opened this issue Feb 15, 2024 · 6 comments
Open

Comments

@RyanGlScott
Copy link
Contributor

RyanGlScott commented Feb 15, 2024

Consider this code, which was minimized from a real-world example:

const char *F(int tag) {
  static const char *const table[] = {
    "A",
    "B",
  };

  return table[tag];
}

int main() {
  return 0;
}

If you run this through crux-llvm with optimizations and Clang 14.0.0, it will crash:

$ ~/Software/crux-llvm-0.8/bin/crux-llvm -O1 test.c
[Crux] Using pointer width: 64 for file crux-build/crux~test.bc
user error (Encountered error while processing global @reltable.F: Illegal operation applied to pointer argument)

It will pass if:

  • Optimizations are disabled, or
  • You use Clang 12.0.0 instead of 14.0.0

The problematic bitcode in the Clang 14.0.0 + optimizations version is:

@reltable.F = internal unnamed_addr constant [2 x i32] [i32 trunc (i64 sub (i64 ptrtoint ([2 x i8]* @.str to i64), i64 ptrtoint ([2 x i32]* @reltable.F to i64)) to i32), i32 trunc (i64 sub (i64 ptrtoint ([2 x i8]* @.str.1 to i64), i64 ptrtoint ([2 x i32]* @reltable.F to i64)) to i32)], align 4
@.str = private unnamed_addr constant [2 x i8] c"A\00", align 1
@.str.1 = private unnamed_addr constant [2 x i8] c"B\00", align 1

Without optimizations (or with Clang 12.0.0), we instead have:

@F.table = internal constant [2 x i8*] [i8* getelementptr inbounds ([2 x i8], [2 x i8]* @.str, i32 0, i32 0), i8* getelementptr inbounds ([2 x i8], [2 x i8]* @.str.1, i32 0, i32 0)], align 16, !dbg !0
@.str = private unnamed_addr constant [2 x i8] c"A\00", align 1
@.str.1 = private unnamed_addr constant [2 x i8] c"B\00", align 1
@RyanGlScott
Copy link
Contributor Author

The code that throws this error is here:

evalIarith op w (ArithPtr symx x) (ArithPtr symy y)
| Just Refl <- testEquality w ?ptrWidth
, symx == symy
, L.Sub _ _ <- op
= return $ IntConst ?ptrWidth (BV.mkBV ?ptrWidth (x - y))
| otherwise
= throwError "Illegal operation applied to pointer argument"

In this case, the op is indeed Sub, but symx and symy are different (in this example, they are @.str/@.str1 and @reltable.F). I'd need to study https://reviews.llvm.org/D94355 (which introduced LLVM's relative lookup table conversion pass) to better understand what the intended semantics are for this constant sub instruction (as well as the llvm.load.relative.* intrinsic, which crucible-llvm lacks an override for).

@RyanGlScott
Copy link
Contributor Author

The LLVM documentation for the llvm.load.relative.* intrinsic reveals the answer:

Syntax:

declare ptr @llvm.load.relative.iN(ptr %ptr, iN %offset) nounwind memory(argmem: read)

Overview:

This intrinsic loads a 32-bit value from the address %ptr + %offset, adds %ptr to that value and returns it. The constant folder specifically recognizes the form of this intrinsic and the constant initializers it may load from; if a loaded constant initializer is known to have the form i32 trunc(x - %ptr), the intrinsic call is folded to x.

LLVM provides that the calculation of such a constant initializer will not overflow at link time under the medium code model if x is an unnamed_addr function. However, it does not provide this guarantee for a constant initializer folded into a function body. This intrinsic can be used to avoid the possibility of overflows when loading from such a constant.

In this case, a likely solution would be to:

  • Reduce the constant sub expression to an IntConst where the offset consists of the offset of the first symbol subtracted by the offset of the second symbol. In terms of the code above, that would be tantamount to deleting the symx == symy check.
  • Implement an override for llvm.load.relative.*(ptr, offset) according to the documentation above.

In the example above, that would mean that F(0) would compute (@.str - @reltable.F) + @reltable.F, and F(1) would compute (@.str1 - @reltable.F) + @reltable.F, just as one would expect.

@RyanGlScott
Copy link
Contributor Author

RyanGlScott commented Feb 16, 2024

Actually, the first bullet point in #1174 (comment) won't work. This is because the offsets for @.str, @.str1, and @reltable.F are all zero, so computing the difference of the offsets won't produce the result we want.

Indeed, this is much trickier than I originally thought. Under most circumstances, subtracting two pointers from different allocation blocks (as would be the case when subtracting two distinct global variables) is undefined behavior in C. As far as I can tell, the only reason that pointer subtraction works out in the code above is because LLVM's constant folder has a special case when the argument to llvm.load.relative.* is of the form i32 trunc(x - %ptr).

Ugh.

@RyanGlScott RyanGlScott changed the title crucible-llvm: Support C string tables with Clang 14.0.0 + optimizations crucible-llvm: Support string tables with Clang 14.0.0 + optimizations Apr 18, 2024
@RyanGlScott
Copy link
Contributor Author

This error can also arise with Rust compiled to LLVM bitcode:

// test.rs
pub enum Table {
    A,
    B,
    C,
}

pub fn f(t: Table) -> &'static str {
    match t {
        Table::A => "A",
        Table::B => "B",
        Table::C => "C",
    }
}

#[no_mangle]
pub fn test() {}
-- test.config
entry-point: "test"
no-compile: yes
make-executables: no
$ rustc test.rs --crate-type=lib --emit=llvm-ir -Copt-level=1
$ ~/Software/crux-llvm-0.8/bin/crux-llvm --config test.config test.bc
[Crux] Using pointer width: 64 for file test.bc
user error (Encountered error while processing global @reltable._ZN4test1f17h2cadd61c0b2dc0d8E: Illegal operation applied to pointer argument)

@RyanGlScott
Copy link
Contributor Author

This reltable optimization has also proved troublesome for CHERI: CTSRD-CHERI/llvm-project#572

@RyanGlScott
Copy link
Contributor Author

Unfortunately, I really don't know how to make crucible-llvm work on the sort of code that this reltable optimization produces. It would be nice if we could tell Clang not to do it, but Clang doesn't appear to offer that level of granularity over its optimization passes (at least, not without rebuilding Clang from source using different pass manager settings).

In light of this, perhaps we should do something akin to the following:

  1. Identify all "reltable-like" global variables that are passed to llvm.load.relative.* intrinsics in an LLVM module.

  2. For each reltable-like global variable, perform an ad-hoc transformation that turns it from looking like this:

    @reltable.F = internal unnamed_addr constant [2 x i32] [i32 trunc (i64 sub (i64 ptrtoint ([2 x i8]* @.str to i64), i64 ptrtoint ([2 x i32]* @reltable.F to i64)) to i32), i32 trunc (i64 sub (i64 ptrtoint ([2 x i8]* @.str.1 to i64), i64 ptrtoint ([2 x i32]* @reltable.F to i64)) to i32)], align 4
    

    To code that looks like this:

    @F.table = internal unnamed_addr constant [2 x i8*] [i8* bitcast ([2 x i8]* @.str) to i8*, i8* bitcast ([2 x i8]* @.str.1) to i8*], align 16, !dbg !0
    

    We may also need to insert a bitcast around @F.table in the call to llvm.load.relative.* as well.

This is all very ugly, but I can't think of a better solution at present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant