`crucible-llvm`: Support string tables with Clang 14.0.0 + optimizations #1174

RyanGlScott · 2024-02-15T16:45:17Z

Consider this code, which was minimized from a real-world example:

const char *F(int tag) {
  static const char *const table[] = {
    "A",
    "B",
  };

  return table[tag];
}

int main() {
  return 0;
}

If you run this through crux-llvm with optimizations and Clang 14.0.0, it will crash:

$ ~/Software/crux-llvm-0.8/bin/crux-llvm -O1 test.c
[Crux] Using pointer width: 64 for file crux-build/crux~test.bc
user error (Encountered error while processing global @reltable.F: Illegal operation applied to pointer argument)

It will pass if:

Optimizations are disabled, or
You use Clang 12.0.0 instead of 14.0.0

The problematic bitcode in the Clang 14.0.0 + optimizations version is:

@reltable.F = internal unnamed_addr constant [2 x i32] [i32 trunc (i64 sub (i64 ptrtoint ([2 x i8]* @.str to i64), i64 ptrtoint ([2 x i32]* @reltable.F to i64)) to i32), i32 trunc (i64 sub (i64 ptrtoint ([2 x i8]* @.str.1 to i64), i64 ptrtoint ([2 x i32]* @reltable.F to i64)) to i32)], align 4
@.str = private unnamed_addr constant [2 x i8] c"A\00", align 1
@.str.1 = private unnamed_addr constant [2 x i8] c"B\00", align 1

Without optimizations (or with Clang 12.0.0), we instead have:

@F.table = internal constant [2 x i8*] [i8* getelementptr inbounds ([2 x i8], [2 x i8]* @.str, i32 0, i32 0), i8* getelementptr inbounds ([2 x i8], [2 x i8]* @.str.1, i32 0, i32 0)], align 16, !dbg !0
@.str = private unnamed_addr constant [2 x i8] c"A\00", align 1
@.str.1 = private unnamed_addr constant [2 x i8] c"B\00", align 1

The text was updated successfully, but these errors were encountered:

RyanGlScott · 2024-02-15T18:41:05Z

The code that throws this error is here:

crucible/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Constant.hs

Lines 668 to 674 in 6d0af7c

    
           evalIarith op w (ArithPtr symx x) (ArithPtr symy y) 
        
             | Just Refl <- testEquality w ?ptrWidth 
        
             , symx == symy 
        
             , L.Sub _ _ <- op 
        
             = return $ IntConst ?ptrWidth (BV.mkBV ?ptrWidth (x - y)) 
        
             | otherwise 
        
             = throwError "Illegal operation applied to pointer argument"

In this case, the op is indeed Sub, but symx and symy are different (in this example, they are @.str/@.str1 and @reltable.F). I'd need to study https://reviews.llvm.org/D94355 (which introduced LLVM's relative lookup table conversion pass) to better understand what the intended semantics are for this constant sub instruction (as well as the llvm.load.relative.* intrinsic, which crucible-llvm lacks an override for).

RyanGlScott · 2024-02-15T18:57:44Z

The LLVM documentation for the llvm.load.relative.* intrinsic reveals the answer:

Syntax:
declare ptr @llvm.load.relative.iN(ptr %ptr, iN %offset) nounwind memory(argmem: read)
Overview:

This intrinsic loads a 32-bit value from the address %ptr + %offset, adds %ptr to that value and returns it. The constant folder specifically recognizes the form of this intrinsic and the constant initializers it may load from; if a loaded constant initializer is known to have the form i32 trunc(x - %ptr), the intrinsic call is folded to x.

LLVM provides that the calculation of such a constant initializer will not overflow at link time under the medium code model if x is an unnamed_addr function. However, it does not provide this guarantee for a constant initializer folded into a function body. This intrinsic can be used to avoid the possibility of overflows when loading from such a constant.

In this case, a likely solution would be to:

Reduce the constant sub expression to an IntConst where the offset consists of the offset of the first symbol subtracted by the offset of the second symbol. In terms of the code above, that would be tantamount to deleting the symx == symy check.
Implement an override for llvm.load.relative.*(ptr, offset) according to the documentation above.

In the example above, that would mean that F(0) would compute (@.str - @reltable.F) + @reltable.F, and F(1) would compute (@.str1 - @reltable.F) + @reltable.F, just as one would expect.

RyanGlScott · 2024-02-16T14:33:51Z

Actually, the first bullet point in #1174 (comment) won't work. This is because the offsets for @.str, @.str1, and @reltable.F are all zero, so computing the difference of the offsets won't produce the result we want.

Indeed, this is much trickier than I originally thought. Under most circumstances, subtracting two pointers from different allocation blocks (as would be the case when subtracting two distinct global variables) is undefined behavior in C. As far as I can tell, the only reason that pointer subtraction works out in the code above is because LLVM's constant folder has a special case when the argument to llvm.load.relative.* is of the form i32 trunc(x - %ptr).

Ugh.

RyanGlScott · 2024-04-18T14:56:31Z

This error can also arise with Rust compiled to LLVM bitcode:

// test.rs
pub enum Table {
    A,
    B,
    C,
}

pub fn f(t: Table) -> &'static str {
    match t {
        Table::A => "A",
        Table::B => "B",
        Table::C => "C",
    }
}

#[no_mangle]
pub fn test() {}

-- test.config
entry-point: "test"
no-compile: yes
make-executables: no

$ rustc test.rs --crate-type=lib --emit=llvm-ir -Copt-level=1
$ ~/Software/crux-llvm-0.8/bin/crux-llvm --config test.config test.bc
[Crux] Using pointer width: 64 for file test.bc
user error (Encountered error while processing global @reltable._ZN4test1f17h2cadd61c0b2dc0d8E: Illegal operation applied to pointer argument)

RyanGlScott · 2024-05-29T20:33:58Z

This reltable optimization has also proved troublesome for CHERI: CTSRD-CHERI/llvm-project#572

RyanGlScott · 2024-05-29T20:43:39Z

Unfortunately, I really don't know how to make crucible-llvm work on the sort of code that this reltable optimization produces. It would be nice if we could tell Clang not to do it, but Clang doesn't appear to offer that level of granularity over its optimization passes (at least, not without rebuilding Clang from source using different pass manager settings).

In light of this, perhaps we should do something akin to the following:

Identify all "reltable-like" global variables that are passed to llvm.load.relative.* intrinsics in an LLVM module.

For each reltable-like global variable, perform an ad-hoc transformation that turns it from looking like this:

@reltable.F = internal unnamed_addr constant [2 x i32] [i32 trunc (i64 sub (i64 ptrtoint ([2 x i8]* @.str to i64), i64 ptrtoint ([2 x i32]* @reltable.F to i64)) to i32), i32 trunc (i64 sub (i64 ptrtoint ([2 x i8]* @.str.1 to i64), i64 ptrtoint ([2 x i32]* @reltable.F to i64)) to i32)], align 4

To code that looks like this:

@F.table = internal unnamed_addr constant [2 x i8*] [i8* bitcast ([2 x i8]* @.str) to i8*, i8* bitcast ([2 x i8]* @.str.1) to i8*], align 16, !dbg !0

We may also need to insert a bitcast around @F.table in the call to llvm.load.relative.* as well.

This is all very ugly, but I can't think of a better solution at present.

RyanGlScott added bug llvm crucible labels Feb 15, 2024

RyanGlScott changed the title ~~crucible-llvm: Support C string tables with Clang 14.0.0 + optimizations~~ crucible-llvm: Support string tables with Clang 14.0.0 + optimizations Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`crucible-llvm`: Support string tables with Clang 14.0.0 + optimizations #1174

`crucible-llvm`: Support string tables with Clang 14.0.0 + optimizations #1174

RyanGlScott commented Feb 15, 2024 •

edited

RyanGlScott commented Feb 15, 2024

RyanGlScott commented Feb 15, 2024

RyanGlScott commented Feb 16, 2024 •

edited

RyanGlScott commented Apr 18, 2024

RyanGlScott commented May 29, 2024

RyanGlScott commented May 29, 2024

crucible-llvm: Support string tables with Clang 14.0.0 + optimizations #1174

crucible-llvm: Support string tables with Clang 14.0.0 + optimizations #1174

Comments

RyanGlScott commented Feb 15, 2024 • edited

RyanGlScott commented Feb 15, 2024

RyanGlScott commented Feb 15, 2024

RyanGlScott commented Feb 16, 2024 • edited

RyanGlScott commented Apr 18, 2024

RyanGlScott commented May 29, 2024

RyanGlScott commented May 29, 2024

`crucible-llvm`: Support string tables with Clang 14.0.0 + optimizations #1174

`crucible-llvm`: Support string tables with Clang 14.0.0 + optimizations #1174

RyanGlScott commented Feb 15, 2024 •

edited

RyanGlScott commented Feb 16, 2024 •

edited