Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read and store gnu build id from loaded core file #92492

Merged
merged 3 commits into from
May 24, 2024

Conversation

GeorgeHuyubo
Copy link
Contributor

As we have debuginfod as symbol locator available in lldb now, we want to make full use of it.
In case of post mortem debugging, we don't always have the main executable available.
However, the .note.gnu.build-id of the main executable(some other modules too), should be available in the core file, as those binaries are loaded in memory and dumped in the core file.

We try to iterate through the NT_FILE entries, read and store the gnu build id if possible. This will be very useful as this id is the unique key which is needed for querying the debuginfod server.

Test:
Build and run lldb. Breakpoint set to https://github.com/llvm/llvm-project/blob/main/lldb/source/Plugins/SymbolLocator/Debuginfod/SymbolLocatorDebuginfod.cpp#L147
Verified after this commit, module_uuid is the correct gnu build id of the main executable which caused the crash(first in the NT_FILE entry)

Previous PR: #92078 was mistakenly merged. This PR is re-opening the commit.

@llvmbot
Copy link
Collaborator

llvmbot commented May 17, 2024

@llvm/pr-subscribers-lldb

Author: None (GeorgeHuyubo)

Changes

As we have debuginfod as symbol locator available in lldb now, we want to make full use of it.
In case of post mortem debugging, we don't always have the main executable available.
However, the .note.gnu.build-id of the main executable(some other modules too), should be available in the core file, as those binaries are loaded in memory and dumped in the core file.

We try to iterate through the NT_FILE entries, read and store the gnu build id if possible. This will be very useful as this id is the unique key which is needed for querying the debuginfod server.

Test:
Build and run lldb. Breakpoint set to https://github.com/llvm/llvm-project/blob/main/lldb/source/Plugins/SymbolLocator/Debuginfod/SymbolLocatorDebuginfod.cpp#L147
Verified after this commit, module_uuid is the correct gnu build id of the main executable which caused the crash(first in the NT_FILE entry)

Previous PR: #92078 was mistakenly merged. This PR is re-opening the commit.


Full diff: https://github.com/llvm/llvm-project/pull/92492.diff

5 Files Affected:

  • (modified) lldb/include/lldb/Target/Process.h (+50)
  • (modified) lldb/source/Commands/CommandObjectMemory.cpp (+2-59)
  • (modified) lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp (+89)
  • (modified) lldb/source/Plugins/Process/elf-core/ProcessElfCore.h (+17)
  • (modified) lldb/source/Target/Process.cpp (+27)
diff --git a/lldb/include/lldb/Target/Process.h b/lldb/include/lldb/Target/Process.h
index aac0cf51680a9..c8a49edc5c78d 100644
--- a/lldb/include/lldb/Target/Process.h
+++ b/lldb/include/lldb/Target/Process.h
@@ -406,6 +406,36 @@ class Process : public std::enable_shared_from_this<Process>,
                                   lldb::StateType state);
   } Notifications;
 
+  class ProcessMemoryIterator {
+  public:
+    ProcessMemoryIterator(lldb::ProcessSP process_sp, lldb::addr_t base)
+        : m_process_sp(process_sp), m_base_addr(base) {
+      lldbassert(process_sp.get() != nullptr);
+    }
+
+    bool IsValid() { return m_is_valid; }
+
+    uint8_t operator[](lldb::addr_t offset) {
+      if (!IsValid())
+        return 0;
+
+      uint8_t retval = 0;
+      Status error;
+      if (0 ==
+          m_process_sp->ReadMemory(m_base_addr + offset, &retval, 1, error)) {
+        m_is_valid = false;
+        return 0;
+      }
+
+      return retval;
+    }
+
+  private:
+    lldb::ProcessSP m_process_sp;
+    lldb::addr_t m_base_addr;
+    bool m_is_valid = true;
+  };
+
   class ProcessEventData : public EventData {
     friend class Process;
 
@@ -1649,6 +1679,26 @@ class Process : public std::enable_shared_from_this<Process>,
 
   lldb::addr_t ReadPointerFromMemory(lldb::addr_t vm_addr, Status &error);
 
+  /// Find a string within a memory region.
+  ///
+  /// This function searches for the string represented by the provided buffer
+  /// within the memory range specified by the low and high addresses. It uses
+  /// a bad character heuristic to optimize the search process.
+  ///
+  /// \param[in] low The starting address of the memory region to be searched.
+  ///
+  /// \param[in] high The ending address of the memory region to be searched.
+  ///
+  /// \param[in] buffer A pointer to the buffer containing the string to be
+  /// searched.
+  ///
+  /// \param[in] buffer_size The size of the buffer in bytes.
+  ///
+  /// \return The address where the string was found or LLDB_INVALID_ADDRESS if
+  /// not found.
+  lldb::addr_t FindInMemory(lldb::addr_t low, lldb::addr_t high,
+                            uint8_t *buffer, size_t buffer_size);
+
   bool WritePointerToMemory(lldb::addr_t vm_addr, lldb::addr_t ptr_value,
                             Status &error);
 
diff --git a/lldb/source/Commands/CommandObjectMemory.cpp b/lldb/source/Commands/CommandObjectMemory.cpp
index b78a0492cca55..1c13484dede64 100644
--- a/lldb/source/Commands/CommandObjectMemory.cpp
+++ b/lldb/source/Commands/CommandObjectMemory.cpp
@@ -977,35 +977,6 @@ class CommandObjectMemoryFind : public CommandObjectParsed {
   Options *GetOptions() override { return &m_option_group; }
 
 protected:
-  class ProcessMemoryIterator {
-  public:
-    ProcessMemoryIterator(ProcessSP process_sp, lldb::addr_t base)
-        : m_process_sp(process_sp), m_base_addr(base) {
-      lldbassert(process_sp.get() != nullptr);
-    }
-
-    bool IsValid() { return m_is_valid; }
-
-    uint8_t operator[](lldb::addr_t offset) {
-      if (!IsValid())
-        return 0;
-
-      uint8_t retval = 0;
-      Status error;
-      if (0 ==
-          m_process_sp->ReadMemory(m_base_addr + offset, &retval, 1, error)) {
-        m_is_valid = false;
-        return 0;
-      }
-
-      return retval;
-    }
-
-  private:
-    ProcessSP m_process_sp;
-    lldb::addr_t m_base_addr;
-    bool m_is_valid = true;
-  };
   void DoExecute(Args &command, CommandReturnObject &result) override {
     // No need to check "process" for validity as eCommandRequiresProcess
     // ensures it is valid
@@ -1106,8 +1077,8 @@ class CommandObjectMemoryFind : public CommandObjectParsed {
     found_location = low_addr;
     bool ever_found = false;
     while (count) {
-      found_location = FastSearch(found_location, high_addr, buffer.GetBytes(),
-                                  buffer.GetByteSize());
+      found_location = process->FindInMemory(
+          found_location, high_addr, buffer.GetBytes(), buffer.GetByteSize());
       if (found_location == LLDB_INVALID_ADDRESS) {
         if (!ever_found) {
           result.AppendMessage("data not found within the range.\n");
@@ -1144,34 +1115,6 @@ class CommandObjectMemoryFind : public CommandObjectParsed {
     result.SetStatus(lldb::eReturnStatusSuccessFinishResult);
   }
 
-  lldb::addr_t FastSearch(lldb::addr_t low, lldb::addr_t high, uint8_t *buffer,
-                          size_t buffer_size) {
-    const size_t region_size = high - low;
-
-    if (region_size < buffer_size)
-      return LLDB_INVALID_ADDRESS;
-
-    std::vector<size_t> bad_char_heuristic(256, buffer_size);
-    ProcessSP process_sp = m_exe_ctx.GetProcessSP();
-    ProcessMemoryIterator iterator(process_sp, low);
-
-    for (size_t idx = 0; idx < buffer_size - 1; idx++) {
-      decltype(bad_char_heuristic)::size_type bcu_idx = buffer[idx];
-      bad_char_heuristic[bcu_idx] = buffer_size - idx - 1;
-    }
-    for (size_t s = 0; s <= (region_size - buffer_size);) {
-      int64_t j = buffer_size - 1;
-      while (j >= 0 && buffer[j] == iterator[s + j])
-        j--;
-      if (j < 0)
-        return low + s;
-      else
-        s += bad_char_heuristic[iterator[s + buffer_size - 1]];
-    }
-
-    return LLDB_INVALID_ADDRESS;
-  }
-
   OptionGroupOptions m_option_group;
   OptionGroupFindMemory m_memory_options;
   OptionGroupMemoryTag m_memory_tag_options;
diff --git a/lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp b/lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp
index 36812c27a5b6d..0d3ba6ea286c2 100644
--- a/lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp
+++ b/lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp
@@ -6,10 +6,12 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include <cstddef>
 #include <cstdlib>
 
 #include <memory>
 #include <mutex>
+#include <tuple>
 
 #include "lldb/Core/Module.h"
 #include "lldb/Core/ModuleSpec.h"
@@ -33,12 +35,17 @@
 #include "Plugins/Process/elf-core/RegisterUtilities.h"
 #include "ProcessElfCore.h"
 #include "ThreadElfCore.h"
+#include "lldb/lldb-types.h"
 
 using namespace lldb_private;
 namespace ELF = llvm::ELF;
 
 LLDB_PLUGIN_DEFINE(ProcessElfCore)
 
+#define ELFOFFSETOF(T, M)                                                      \
+  addr_size == 4 ? offsetof(llvm::ELF::Elf32_##T, M)                           \
+                 : offsetof(llvm::ELF::Elf64_##T, M)
+
 llvm::StringRef ProcessElfCore::GetPluginDescriptionStatic() {
   return "ELF core dump plug-in.";
 }
@@ -250,6 +257,9 @@ Status ProcessElfCore::DoLoadCore() {
     }
   }
 
+  // Try to find gnu build id before we load the executable.
+  UpdateBuildIdForNTFileEntries();
+
   // Core files are useless without the main executable. See if we can locate
   // the main executable using data we found in the core file notes.
   lldb::ModuleSP exe_module_sp = GetTarget().GetExecutableModule();
@@ -258,6 +268,7 @@ Status ProcessElfCore::DoLoadCore() {
     if (!m_nt_file_entries.empty()) {
       ModuleSpec exe_module_spec;
       exe_module_spec.GetArchitecture() = arch;
+      exe_module_spec.GetUUID() = m_nt_file_entries[0].uuid;
       exe_module_spec.GetFileSpec().SetFile(m_nt_file_entries[0].path,
                                             FileSpec::Style::native);
       if (exe_module_spec.GetFileSpec()) {
@@ -271,6 +282,17 @@ Status ProcessElfCore::DoLoadCore() {
   return error;
 }
 
+void ProcessElfCore::UpdateBuildIdForNTFileEntries() {
+  if (!m_nt_file_entries.empty()) {
+    for (NT_FILE_Entry &entry : m_nt_file_entries) {
+      std::optional<UUID> uuid =
+          FindNote(entry.start, llvm::ELF::NT_GNU_BUILD_ID);
+      if (uuid)
+        entry.uuid = uuid.value();
+    }
+  }
+}
+
 lldb_private::DynamicLoader *ProcessElfCore::GetDynamicLoader() {
   if (m_dyld_up.get() == nullptr)
     m_dyld_up.reset(DynamicLoader::FindPlugin(
@@ -983,6 +1005,73 @@ llvm::Error ProcessElfCore::ParseThreadContextsFromNoteSegment(
   }
 }
 
+bool ProcessElfCore::IsElf(const lldb::addr_t address) {
+  uint8_t buf[4];
+  Status error;
+  size_t byte_read = ReadMemory(address, buf, 4, error);
+  if (byte_read != 4)
+    return false;
+  return elf::ELFHeader::MagicBytesMatch(buf);
+}
+
+std::optional<UUID> ProcessElfCore::FindNote(const lldb::addr_t address,
+                                             const uint32_t type) {
+  if (!IsElf(address))
+    return std::nullopt;
+  const uint32_t addr_size = GetAddressByteSize();
+  const lldb::offset_t ehdr_phoff_offset = ELFOFFSETOF(Ehdr, e_phoff);
+  const lldb::offset_t ehdr_phentsize_offset = ELFOFFSETOF(Ehdr, e_phentsize);
+  const lldb::offset_t ehdr_phnum_offset = ELFOFFSETOF(Ehdr, e_phnum);
+  const size_t elf_header_size = addr_size == 4 ? sizeof(llvm::ELF::Elf32_Ehdr)
+                                                : sizeof(llvm::ELF::Elf64_Ehdr);
+
+  unsigned char buf[4096];
+  Status error;
+  size_t byte_read = ReadMemory(address, buf, elf_header_size, error);
+  DataExtractor data(buf, 4096, GetByteOrder(), addr_size);
+  lldb::offset_t offset = ehdr_phoff_offset;
+  lldb::offset_t phoff = data.GetAddress(&offset);
+
+  offset = ehdr_phentsize_offset;
+  lldb::offset_t phentsize = data.GetU16(&offset);
+  offset = ehdr_phnum_offset;
+  lldb::offset_t phnum = data.GetU16(&offset);
+
+  Section_Note note;
+  const lldb::addr_t ph_addr = address + phoff;
+
+  for (unsigned int i = 0; i < phnum; ++i) {
+    byte_read = ReadMemory(ph_addr + i * phentsize, buf, phentsize, error);
+    if (byte_read != phentsize)
+      break;
+    offset = 0;
+    uint32_t p_type = data.GetU32(&offset);
+    if (p_type != llvm::ELF::PT_NOTE)
+      continue;
+    offset = ELFOFFSETOF(Phdr, p_vaddr);
+    lldb::addr_t p_vaddr = data.GetAddress(&offset);
+    offset = ELFOFFSETOF(Phdr, p_memsz);
+    lldb::addr_t p_memsz = data.GetAddress(&offset);
+
+    byte_read = ReadMemory(p_vaddr, buf, p_memsz, error);
+    if (byte_read != p_memsz)
+      continue;
+    offset = 0;
+    while (
+        offset < p_memsz &&
+        data.GetU32(&offset, &note, sizeof(Section_Note) / sizeof(uint32_t))) {
+      if (note.namesz == 4 && note.type == type) {
+        const char *name = data.GetCStr(&offset);
+        if (name && strcmp("GNU", name) == 0)
+          return UUID(
+              llvm::ArrayRef<uint8_t>(buf + offset, note.descsz /*byte size*/));
+      }
+      offset += note.namesz + note.descsz;
+    }
+  }
+  return std::nullopt;
+}
+
 uint32_t ProcessElfCore::GetNumThreadContexts() {
   if (!m_thread_data_valid)
     DoLoadCore();
diff --git a/lldb/source/Plugins/Process/elf-core/ProcessElfCore.h b/lldb/source/Plugins/Process/elf-core/ProcessElfCore.h
index 2cec635bbacfe..b935f0d4e0103 100644
--- a/lldb/source/Plugins/Process/elf-core/ProcessElfCore.h
+++ b/lldb/source/Plugins/Process/elf-core/ProcessElfCore.h
@@ -117,6 +117,13 @@ class ProcessElfCore : public lldb_private::PostMortemProcess {
     lldb::addr_t end;
     lldb::addr_t file_ofs;
     std::string path;
+    lldb_private::UUID uuid; //.note.gnu.build-id
+  };
+
+  struct Section_Note {
+    uint32_t namesz;
+    uint32_t descsz;
+    uint32_t type;
   };
 
   // For ProcessElfCore only
@@ -158,6 +165,16 @@ class ProcessElfCore : public lldb_private::PostMortemProcess {
   // Returns number of thread contexts stored in the core file
   uint32_t GetNumThreadContexts();
 
+  // Populate gnu uuid for each NT_FILE entry
+  void UpdateBuildIdForNTFileEntries();
+
+  // Returns the value of certain type of note of a given start address
+  std::optional<lldb_private::UUID> FindNote(const lldb::addr_t address,
+                                             const uint32_t type);
+
+  // Returns true if the given address is a start of ELF file
+  bool IsElf(const lldb::addr_t address);
+
   // Parse a contiguous address range of the process from LOAD segment
   lldb::addr_t
   AddAddressRangeFromLoadSegment(const elf::ELFProgramHeader &header);
diff --git a/lldb/source/Target/Process.cpp b/lldb/source/Target/Process.cpp
index 25afade9a8275..6f5c43bc41082 100644
--- a/lldb/source/Target/Process.cpp
+++ b/lldb/source/Target/Process.cpp
@@ -3191,6 +3191,33 @@ Status Process::Halt(bool clear_thread_plans, bool use_run_lock) {
   return Status();
 }
 
+lldb::addr_t Process::FindInMemory(lldb::addr_t low, lldb::addr_t high,
+                                   uint8_t *buffer, size_t buffer_size) {
+  const size_t region_size = high - low;
+
+  if (region_size < buffer_size)
+    return LLDB_INVALID_ADDRESS;
+
+  std::vector<size_t> bad_char_heuristic(256, buffer_size);
+  ProcessMemoryIterator iterator(shared_from_this(), low);
+
+  for (size_t idx = 0; idx < buffer_size - 1; idx++) {
+    decltype(bad_char_heuristic)::size_type bcu_idx = buffer[idx];
+    bad_char_heuristic[bcu_idx] = buffer_size - idx - 1;
+  }
+  for (size_t s = 0; s <= (region_size - buffer_size);) {
+    int64_t j = buffer_size - 1;
+    while (j >= 0 && buffer[j] == iterator[s + j])
+      j--;
+    if (j < 0)
+      return low + s;
+    else
+      s += bad_char_heuristic[iterator[s + buffer_size - 1]];
+  }
+
+  return LLDB_INVALID_ADDRESS;
+}
+
 Status Process::StopForDestroyOrDetach(lldb::EventSP &exit_event_sp) {
   Status error;
 

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much better, though, after looking at this closer, I don't think it's necessary to reimplement the elf parsing code. We already have all the necessary data structures for parsing notes in the core file itself, and it looks like it would be fairly easy to reuse them for this purpose as well.

lldb/include/lldb/Target/Process.h Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.h Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
lldb/include/lldb/Target/Process.h Outdated Show resolved Hide resolved
lldb/include/lldb/Target/Process.h Outdated Show resolved Hide resolved
lldb/include/lldb/Target/Process.h Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.h Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.h Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.h Outdated Show resolved Hide resolved
Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very close. Just a couple of details.

BTW, I believe the recommended workflow for working with pull requests is to put your new changes as additional commits on top and then squash them all together after merging. The GitHub UI just works better that way.

lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
lldb/source/Plugins/Process/elf-core/ProcessElfCore.cpp Outdated Show resolved Hide resolved
@GeorgeHuyubo GeorgeHuyubo requested a review from labath May 24, 2024 16:22
@labath
Copy link
Collaborator

labath commented May 24, 2024

Thanks for your patience.

@GeorgeHuyubo GeorgeHuyubo merged commit ccde823 into llvm:main May 24, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants