Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reapply "[lldb/aarch64] Fix unwinding when signal interrupts a leaf f… #92503

Merged
merged 2 commits into from
May 21, 2024

Conversation

labath
Copy link
Collaborator

@labath labath commented May 17, 2024

…unction (#91321)"

This reapplies fd1bd53, which was reverted due to a test failure on aarch64/windows. The failure was caused by a combination of several factors:

  • clang targeting aarch64-windows (unlike msvc, and unlike clang targeting other aarch64 platforms) defaults to -fomit-frame-pointers
  • lldb's code for looking up register values for <same> unwind rules is recursive
  • the test binary creates a very long chain of fp-less function frames (it manages to fit about 22k frames before it blows its stack)

Together, these things have caused lldb to recreate the same deep recursion when unwinding through this, and blow its own stack as well. Since lldb frames are larger, about 4k frames like this was sufficient to trigger the stack overflow.

This version of the patch works around this problem by increasing the frame size of the test binary, thereby causing it to blow its stack sooner. This doesn't fix the issue -- the same problem can occur with a real binary -- but it's not very likely, as it requires an infinite recursion in a simple (so it doesn't use the frame pointer) function with a very small frame (so you can fit a lot of them on the stack).

A more principled fix would be to make lldb's lookup code non-recursive, but I believe that's out of scope for this patch.

The original patch description follows:

A leaf function may not store the link register to stack, but we it can still end up being a non-zero frame if it gets interrupted by a signal. Currently, we were unable to unwind past this function because we could not read the link register value.

To make this work, this patch:

  • changes the function-entry unwind plan to include the fp|lr = <same> rules. This in turn necessitated an adjustment in the generic instruction emulation logic to ensure that lr=[sp-X] can override the <same> rule.
  • allows the <same> rule for pc and lr in all m_all_registers_available frames (and not just frame zero).

The test verifies that we can unwind in a situation like this, and that the backtrace matches the one we computed before getting a signal.

…unction (llvm#91321)"

This reapplies fd1bd53, which was
reverted due to a test failure on aarch64/windows. The failure was
caused by a combination of several factors:
- clang targeting aarch64-windows (unlike msvc, and unlike clang
  targeting other aarch64 platforms) defaults to -fomit-frame-pointers
- lldb's code for looking up register values for `<same>` unwind rules
  is recursive
- the test binary creates a very long chain of fp-less function frames
  (it manages to fit about 22k frames before it blows its stack)

Together, these things have caused lldb to recreate the same deep
recursion when unwinding through this, and blow its own stack as well.
Since lldb frames are larger, about 4k frames like this was sufficient
to trigger the stack overflow.

This version of the patch works around this problem by increasing the
frame size of the test binary, thereby causing it to blow its stack
sooner. This doesn't fix the issue -- the same problem can occur with a
real binary -- but it's not very likely, as it requires an infinite
recursion in a simple (so it doesn't use the frame pointer) function
with a very small frame (so you can fit a lot of them on the stack).

A more principled fix would be to make lldb's lookup code non-recursive,
but I believe that's out of scope for this patch.

The original patch description follows:

A leaf function may not store the link register to stack, but we it can
still end up being a non-zero frame if it gets interrupted by a signal.
Currently, we were unable to unwind past this function because we could
not read the link register value.

To make this work, this patch:
- changes the function-entry unwind plan to include the `fp|lr = <same>`
rules. This in turn necessitated an adjustment in the generic
instruction emulation logic to ensure that `lr=[sp-X]` can override the
`<same>` rule.
- allows the `<same>` rule for pc and lr in all
`m_all_registers_available` frames (and not just frame zero).

The test verifies that we can unwind in a situation like this, and that
the backtrace matches the one we computed before getting a signal.
@llvmbot
Copy link
Collaborator

llvmbot commented May 17, 2024

@llvm/pr-subscribers-lldb

Author: Pavel Labath (labath)

Changes

…unction (#91321)"

This reapplies fd1bd53, which was reverted due to a test failure on aarch64/windows. The failure was caused by a combination of several factors:

  • clang targeting aarch64-windows (unlike msvc, and unlike clang targeting other aarch64 platforms) defaults to -fomit-frame-pointers
  • lldb's code for looking up register values for &lt;same&gt; unwind rules is recursive
  • the test binary creates a very long chain of fp-less function frames (it manages to fit about 22k frames before it blows its stack)

Together, these things have caused lldb to recreate the same deep recursion when unwinding through this, and blow its own stack as well. Since lldb frames are larger, about 4k frames like this was sufficient to trigger the stack overflow.

This version of the patch works around this problem by increasing the frame size of the test binary, thereby causing it to blow its stack sooner. This doesn't fix the issue -- the same problem can occur with a real binary -- but it's not very likely, as it requires an infinite recursion in a simple (so it doesn't use the frame pointer) function with a very small frame (so you can fit a lot of them on the stack).

A more principled fix would be to make lldb's lookup code non-recursive, but I believe that's out of scope for this patch.

The original patch description follows:

A leaf function may not store the link register to stack, but we it can still end up being a non-zero frame if it gets interrupted by a signal. Currently, we were unable to unwind past this function because we could not read the link register value.

To make this work, this patch:

  • changes the function-entry unwind plan to include the fp|lr = &lt;same&gt; rules. This in turn necessitated an adjustment in the generic instruction emulation logic to ensure that lr=[sp-X] can override the &lt;same&gt; rule.
  • allows the &lt;same&gt; rule for pc and lr in all m_all_registers_available frames (and not just frame zero).

The test verifies that we can unwind in a situation like this, and that the backtrace matches the one we computed before getting a signal.


Full diff: https://github.com/llvm/llvm-project/pull/92503.diff

7 Files Affected:

  • (modified) lldb/source/Plugins/Instruction/ARM64/EmulateInstructionARM64.cpp (+2)
  • (modified) lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp (+1-3)
  • (modified) lldb/source/Target/RegisterContextUnwind.cpp (+3-3)
  • (modified) lldb/test/API/functionalities/bt-interrupt/main.c (+1)
  • (added) lldb/test/Shell/Unwind/Inputs/signal-in-leaf-function-aarch64.c (+15)
  • (added) lldb/test/Shell/Unwind/signal-in-leaf-function-aarch64.test (+27)
  • (modified) lldb/unittests/UnwindAssembly/ARM64/TestArm64InstEmulation.cpp (+20-4)
diff --git a/lldb/source/Plugins/Instruction/ARM64/EmulateInstructionARM64.cpp b/lldb/source/Plugins/Instruction/ARM64/EmulateInstructionARM64.cpp
index 6ca4fb052457e..62ecac3e0831d 100644
--- a/lldb/source/Plugins/Instruction/ARM64/EmulateInstructionARM64.cpp
+++ b/lldb/source/Plugins/Instruction/ARM64/EmulateInstructionARM64.cpp
@@ -444,6 +444,8 @@ bool EmulateInstructionARM64::CreateFunctionEntryUnwind(
 
   // Our previous Call Frame Address is the stack pointer
   row->GetCFAValue().SetIsRegisterPlusOffset(gpr_sp_arm64, 0);
+  row->SetRegisterLocationToSame(gpr_lr_arm64, /*must_replace=*/false);
+  row->SetRegisterLocationToSame(gpr_fp_arm64, /*must_replace=*/false);
 
   unwind_plan.AppendRow(row);
   unwind_plan.SetSourceName("EmulateInstructionARM64");
diff --git a/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp b/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp
index c4a171ec7d01b..49edd40544e32 100644
--- a/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp
+++ b/lldb/source/Plugins/UnwindAssembly/InstEmulation/UnwindAssemblyInstEmulation.cpp
@@ -424,8 +424,6 @@ size_t UnwindAssemblyInstEmulation::WriteMemory(
     log->PutString(strm.GetString());
   }
 
-  const bool cant_replace = false;
-
   switch (context.type) {
   default:
   case EmulateInstruction::eContextInvalid:
@@ -467,7 +465,7 @@ size_t UnwindAssemblyInstEmulation::WriteMemory(
         m_pushed_regs[reg_num] = addr;
         const int32_t offset = addr - m_initial_sp;
         m_curr_row->SetRegisterLocationToAtCFAPlusOffset(reg_num, offset,
-                                                         cant_replace);
+                                                         /*can_replace=*/true);
         m_curr_row_modified = true;
       }
     }
diff --git a/lldb/source/Target/RegisterContextUnwind.cpp b/lldb/source/Target/RegisterContextUnwind.cpp
index 13e101413a477..e2d712cb72eae 100644
--- a/lldb/source/Target/RegisterContextUnwind.cpp
+++ b/lldb/source/Target/RegisterContextUnwind.cpp
@@ -1555,12 +1555,12 @@ RegisterContextUnwind::SavedLocationForRegister(
   }
 
   if (unwindplan_regloc.IsSame()) {
-    if (!IsFrameZero() &&
+    if (!m_all_registers_available &&
         (regnum.GetAsKind(eRegisterKindGeneric) == LLDB_REGNUM_GENERIC_PC ||
          regnum.GetAsKind(eRegisterKindGeneric) == LLDB_REGNUM_GENERIC_RA)) {
       UnwindLogMsg("register %s (%d) is marked as 'IsSame' - it is a pc or "
-                   "return address reg on a non-zero frame -- treat as if we "
-                   "have no information",
+                   "return address reg on a frame which does not have all "
+                   "registers available -- treat as if we have no information",
                    regnum.GetName(), regnum.GetAsKind(eRegisterKindLLDB));
       return UnwindLLDB::RegisterSearchResult::eRegisterNotFound;
     } else {
diff --git a/lldb/test/API/functionalities/bt-interrupt/main.c b/lldb/test/API/functionalities/bt-interrupt/main.c
index bdaf423d334ef..14a9eb6ffc856 100644
--- a/lldb/test/API/functionalities/bt-interrupt/main.c
+++ b/lldb/test/API/functionalities/bt-interrupt/main.c
@@ -12,6 +12,7 @@ struct Foo {
 
 int
 forgot_termination(int input, struct Foo my_foo) {
+  char frame_increasing_buffer[0x1000]; // To blow the stack sooner.
   return forgot_termination(++input, my_foo);
 }
 
diff --git a/lldb/test/Shell/Unwind/Inputs/signal-in-leaf-function-aarch64.c b/lldb/test/Shell/Unwind/Inputs/signal-in-leaf-function-aarch64.c
new file mode 100644
index 0000000000000..fe020affcad0f
--- /dev/null
+++ b/lldb/test/Shell/Unwind/Inputs/signal-in-leaf-function-aarch64.c
@@ -0,0 +1,15 @@
+#include <signal.h>
+#include <unistd.h>
+
+int __attribute__((naked)) signal_generating_add(int a, int b) {
+  asm("add w0, w1, w0\n\t"
+      "udf #0xdead\n\t"
+      "ret");
+}
+
+void sigill_handler(int signo) { _exit(0); }
+
+int main() {
+  signal(SIGILL, sigill_handler);
+  return signal_generating_add(42, 47);
+}
diff --git a/lldb/test/Shell/Unwind/signal-in-leaf-function-aarch64.test b/lldb/test/Shell/Unwind/signal-in-leaf-function-aarch64.test
new file mode 100644
index 0000000000000..09f17c174bbfd
--- /dev/null
+++ b/lldb/test/Shell/Unwind/signal-in-leaf-function-aarch64.test
@@ -0,0 +1,27 @@
+# REQUIRES: target-aarch64 && native
+# UNSUPPORTED: system-windows
+
+# RUN: %clang_host %S/Inputs/signal-in-leaf-function-aarch64.c -o %t
+# RUN: %lldb -s %s -o exit %t | FileCheck %s
+
+# Convert EXC_BAD_INSTRUCTION to SIGILL on darwin
+settings set platform.plugin.darwin.ignored-exceptions EXC_BAD_INSTRUCTION
+
+breakpoint set -n sigill_handler
+# CHECK: Breakpoint 1: where = {{.*}}`sigill_handler
+
+run
+# CHECK: thread #1, {{.*}} stop reason = signal SIGILL
+
+thread backtrace
+# CHECK: frame #0: [[ADD:0x[0-9a-fA-F]*]] {{.*}}`signal_generating_add
+# CHECK: frame #1: [[MAIN:0x[0-9a-fA-F]*]] {{.*}}`main
+
+continue
+# CHECK: thread #1, {{.*}} stop reason = breakpoint 1
+
+thread backtrace
+# CHECK: frame #0: {{.*}}`sigill_handler
+# Unknown number of signal trampoline frames
+# CHECK: frame #{{[0-9]+}}: [[ADD]] {{.*}}`signal_generating_add
+# CHECK: frame #{{[0-9]+}}: [[MAIN]] {{.*}}`main
diff --git a/lldb/unittests/UnwindAssembly/ARM64/TestArm64InstEmulation.cpp b/lldb/unittests/UnwindAssembly/ARM64/TestArm64InstEmulation.cpp
index 80abeb8fae9e5..9303d6f5f3c6e 100644
--- a/lldb/unittests/UnwindAssembly/ARM64/TestArm64InstEmulation.cpp
+++ b/lldb/unittests/UnwindAssembly/ARM64/TestArm64InstEmulation.cpp
@@ -77,7 +77,7 @@ TEST_F(TestArm64InstEmulation, TestSimpleDarwinFunction) {
 
   // UnwindPlan we expect:
 
-  // row[0]:    0: CFA=sp +0 =>
+  // row[0]:    0: CFA=sp +0 => fp= <same> lr= <same>
   // row[1]:    4: CFA=sp+16 => fp=[CFA-16] lr=[CFA-8]
   // row[2]:    8: CFA=fp+16 => fp=[CFA-16] lr=[CFA-8]
   // row[2]:   16: CFA=sp+16 => fp=[CFA-16] lr=[CFA-8]
@@ -88,13 +88,19 @@ TEST_F(TestArm64InstEmulation, TestSimpleDarwinFunction) {
   EXPECT_TRUE(engine->GetNonCallSiteUnwindPlanFromAssembly(
       sample_range, data, sizeof(data), unwind_plan));
 
-  // CFA=sp +0
+  // CFA=sp +0 => fp= <same> lr= <same>
   row_sp = unwind_plan.GetRowForFunctionOffset(0);
   EXPECT_EQ(0ull, row_sp->GetOffset());
   EXPECT_TRUE(row_sp->GetCFAValue().GetRegisterNumber() == gpr_sp_arm64);
   EXPECT_TRUE(row_sp->GetCFAValue().IsRegisterPlusOffset() == true);
   EXPECT_EQ(0, row_sp->GetCFAValue().GetOffset());
 
+  EXPECT_TRUE(row_sp->GetRegisterInfo(gpr_fp_arm64, regloc));
+  EXPECT_TRUE(regloc.IsSame());
+
+  EXPECT_TRUE(row_sp->GetRegisterInfo(gpr_lr_arm64, regloc));
+  EXPECT_TRUE(regloc.IsSame());
+
   // CFA=sp+16 => fp=[CFA-16] lr=[CFA-8]
   row_sp = unwind_plan.GetRowForFunctionOffset(4);
   EXPECT_EQ(4ull, row_sp->GetOffset());
@@ -146,6 +152,12 @@ TEST_F(TestArm64InstEmulation, TestSimpleDarwinFunction) {
   EXPECT_TRUE(row_sp->GetCFAValue().GetRegisterNumber() == gpr_sp_arm64);
   EXPECT_TRUE(row_sp->GetCFAValue().IsRegisterPlusOffset() == true);
   EXPECT_EQ(0, row_sp->GetCFAValue().GetOffset());
+
+  EXPECT_TRUE(row_sp->GetRegisterInfo(gpr_fp_arm64, regloc));
+  EXPECT_TRUE(regloc.IsSame());
+
+  EXPECT_TRUE(row_sp->GetRegisterInfo(gpr_lr_arm64, regloc));
+  EXPECT_TRUE(regloc.IsSame());
 }
 
 TEST_F(TestArm64InstEmulation, TestMediumDarwinFunction) {
@@ -381,8 +393,12 @@ TEST_F(TestArm64InstEmulation, TestFramelessThreeEpilogueFunction) {
   EXPECT_FALSE(row_sp->GetRegisterInfo(gpr_x26_arm64, regloc));
   EXPECT_FALSE(row_sp->GetRegisterInfo(gpr_x27_arm64, regloc));
   EXPECT_FALSE(row_sp->GetRegisterInfo(gpr_x28_arm64, regloc));
-  EXPECT_FALSE(row_sp->GetRegisterInfo(gpr_fp_arm64, regloc));
-  EXPECT_FALSE(row_sp->GetRegisterInfo(gpr_lr_arm64, regloc));
+
+  EXPECT_TRUE(row_sp->GetRegisterInfo(gpr_fp_arm64, regloc));
+  EXPECT_TRUE(regloc.IsSame());
+
+  EXPECT_TRUE(row_sp->GetRegisterInfo(gpr_lr_arm64, regloc));
+  EXPECT_TRUE(regloc.IsSame());
 
   row_sp = unwind_plan.GetRowForFunctionOffset(36);
   EXPECT_TRUE(row_sp->GetCFAValue().GetRegisterNumber() == gpr_sp_arm64);

@jasonmolenda
Copy link
Collaborator

Yes, originally lldb's unwinder was recursive for any register propagation and it was easy to hit the problem of lldb blowing out its own stack on a recursive inferior that had crashed. I changed most of the propagation to a loop to solve this (years and years ago) but it looks like we still have a case where it is recursing.

We still need to skip the test case on macOS until I can come up with some idea to get proper unwind instruction for sigtramp on arm64. Most of the CI bots are x86_64 so it may pass on them, but that's The Past and I would prefer to skip this on Darwin until I can figure something out, I wrote a little TODO on myself in rdar://128031075

This looks good to me.

Copy link
Collaborator

@jasonmolenda jasonmolenda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@labath
Copy link
Collaborator Author

labath commented May 21, 2024

Yes, originally lldb's unwinder was recursive for any register propagation and it was easy to hit the problem of lldb blowing out its own stack on a recursive inferior that had crashed. I changed most of the propagation to a loop to solve this (years and years ago) but it looks like we still have a case where it is recursing.

We still need to skip the test case on macOS until I can come up with some idea to get proper unwind instruction for sigtramp on arm64. Most of the CI bots are x86_64 so it may pass on them, but that's The Past and I would prefer to skip this on Darwin until I can figure something out, I wrote a little TODO on myself in rdar://128031075

Sounds good (I meant to xfail this on darwin, but it looks like I forgot).

@labath labath merged commit bbd54e0 into llvm:main May 21, 2024
4 checks passed
@labath labath deleted the unwind branch May 21, 2024 08:56
@jasonmolenda
Copy link
Collaborator

btw @labath I was thinking about "sigtramp routines" and fault / trap / interrupt handlers in general, and how lldb has this list of function names that it treats as fault handlers (target.trap-handler-names). And in the unwinder we have the same idea of "the frame above a designed sigtramp/fault/trap/interrupt function can retrieve all registers".

But what if the unwinder had a method saying "Do you have an unwind rule to give me a value for register n, without iterating down the stack?" e.g. frame 5 is interrupted and frame 4 is sigtramp, with full eh_frame. From frame 5, I can say "does frame 4 have an unwind rule to provide x0, without iterating down to frame 3, etc." This also means if we have a sigtramp which is missing its eh_frame, we won't apply our "all registers available" rules to the frame above.

An interesting case for a return-address target like aarch64 where normally when we ask for a caller frame's pc, we fetch the link register. But above a fault/trap/interrupt frame, we can retrieve both the pc and the link register and they are different values.

Just something I started kicking around in my head, I don't have concrete plans to implement an overhaul like this but the more I think about it, the more I like it.

@jasonmolenda
Copy link
Collaborator

We should be able to work correctly with a trap handler that has full eh_frame without knowing the function name. And we shouldn't treat a sigtramp missing eh_frame as having all registers.

@jasonmolenda
Copy link
Collaborator

(with the caveat that a register location of IsSame for a volatile aka non-callee-spilled register would be treated as "did not have a location")

@jasonmolenda
Copy link
Collaborator

as cool as this idea is, I do worry that it will make the code less readable, where instead of saying "BehavesLikeFrameZero / HasAllRegisters", we now need to ask "can the frame below supply register x", I don't know. it's just something I have running around my head today.

@labath
Copy link
Collaborator Author

labath commented May 22, 2024

I'm not quite sure what you have in mind, but I can tell you what's been going through my mind in the context of the m_all_registers_available check in lldb/source/Target/RegisterContextUnwind.cpp . The way I see it, this check (at least the part about the RA register(*)) is heuristic that's impossible to get always right. Like, I could construct a test case using functions with non-standard ABIs where a non-leaf function legitimately has a lr=<same> rule. Such code would execute correctly but lldb would refuse to unwind it due to the lr=<same> restriction.

The only thing needed to construct such a test case is one (possibly leaf) function, which "returns" to a register other than lr (it could even return to a memory address). Then, its caller could "call" that function by storing the return address to some other place, and leaving it's own return address register (i.e, lr) untouched. (I don't know why anyone would do such a thing, since it would likely mess up the CPUs branch predictor, but dwarf is perfectly capable of expressing code like this)

Another interesting case is that of a function (an abi-respecting function this time), which chooses to save the lr to a different (non-volatile) register, instead of the usual stack location. This function could then call other functions as usual, but we wouldn't be able to unwind from it in a single step -- to get its value of lr (i.e., the pc of the frame above it), we would need to find where has the frame below stored the register that lr was saved to. (I also don't know of anyone writing code like this, but unlike the previous case, I can imagine some very specific situations where such an optimization might be profitable.)

All of this is to say that I don't think there is a way to change this piece of code to be correct all the time -- we'd just be trading one set of edge cases for the other. I think that the most correct solution would be to remove this check altogether. I'm not sure why it exists, but I expect it has something to do with preventing looping stacks. However, if I remember correctly, we already have some checks to prevent stack loops (if not, then we should have, as there are other ways to create stack loops), so I think it should be possible to let the lr=<same> (*) rule through here and catch erroneous cases further down the road. However, I also don't have any plans to pursue this direction.

(*) I'm only talking about the lr rule everywhere. I think that a pc=<same> rule would always be an error (even in signal handlers), so we should be able to keep that here. OTOH, if our loop detection code is robust enough, then there should be no harm in letting this through either...

@jasonmolenda
Copy link
Collaborator

The way I see it, this check (at least the part about the RA register(*)) is heuristic that's impossible to get always right. Like, I could construct a test case using functions with non-standard ABIs where a non-leaf function legitimately has a lr=<same> rule. Such code would execute correctly but lldb would refuse to unwind it due to the lr=<same> restriction.

This would be an interesting idea. I don't think there's any unwind format which allows you to specify that a different register holds the return address, and lr is IsSame. You could say that lr=x9 to say that the return address is in x9, but you can't express that the return address is stored in a non-lr register. You could add an unwind rule for pc=x9 to say that the return address is in x9, and depend on the unwinder to not look for lr, but to try retrieving pc first.

All of this is to say that I don't think there is a way to change this piece of code to be correct all the time -- we'd just be trading one set of edge cases for the other. I think that the most correct solution would be to remove this check altogether. I'm not sure why it exists, but I expect it has something to do with preventing looping stacks.

The original goal was that if we're on frame 1, we don't want to surface a register value from frame 0 and use it in frame 1 unless it's a callee-spilled register. e.g. x0 on frame 1 may have been overwritten while frame 0 was executing, there is no unwind rule for x0 and the unwinder can't show frame 0's x0 value in frame 1. But if we're above a sigtramp etc frame which has the entire register context from when a function was interrupted, we can retrieve all the registers and want to show them.

(*) I'm only talking about the lr rule everywhere. I think that a pc=<same> rule would always be an error (even in signal handlers), so we should be able to keep that here. OTOH, if our loop detection code is robust enough, then there should be no harm in letting this through either...

Yeah I think there's improvements that can be made here for sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants