Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unwind on SIGSEGV on aarch64 (due to small stack for signal) #64058

Merged
merged 2 commits into from
May 20, 2024

Conversation

azat
Copy link
Collaborator

@azat azat commented May 17, 2024

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix unwind on SIGSEGV on aarch64 (due to small stack for signal)

Only SIGSEGV uses alternative stack (sigaltstack()), which is very
small, 16K, and for aarch64 it is likely not enough for unwinding
(likely due to lots of registers on this platform):

(gdb) bt
#0  libunwind::CFI_Parser<libunwind::LocalAddressSpace>::parseFDEInstructions (addressSpace=..., fdeInfo=..., cieInfo=..., upToPC=<optimized out>, arch=4, results=<optimized out>) at ./contrib/libunwind/src/DwarfParser.hpp:561

And this is:

554       case DW_CFA_remember_state: {
555         // Avoid operator new because that would be an upward dependency.
556         // Avoid malloc because it needs heap allocation.
557         PrologInfoStackEntry *entry =
558             (PrologInfoStackEntry *)_LIBUNWIND_REMEMBER_ALLOC(
559                 sizeof(PrologInfoStackEntry));
560         if (entry != NULL) {
561           entry->next = rememberStack.entry;
^^^
562           entry->info = *results;
563           rememberStack.entry = entry;
564         } else {
565           return false;
566         }
567         _LIBUNWIND_TRACE_DWARF("DW_CFA_remember_state\n");
568         break;
569       }

Fixes: #63855 (cc @maxknv)
Reverts: #63867
Supersedes: #63959 (cc @alesapin )

#job_style_check
#job_Integration_tests_aarch64
#job_Integration_tests_release

@robot-ch-test-poll3
Copy link
Contributor

robot-ch-test-poll3 commented May 17, 2024

This is an automated comment for commit 81a0c63 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check nameDescriptionStatus
A SyncThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS❌ failure
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR⏳ pending
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests❌ failure
Mergeable CheckChecks if all other necessary checks are successful❌ failure
Successful checks
Check nameDescriptionStatus
Docs checkBuilds and tests the documentation✅ success
PR CheckThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success

@nickitat nickitat self-assigned this May 17, 2024
@alesapin
Copy link
Member

Confirmed, works

2024.05.17 18:19:08.858595 [ 653091 ] {} <Trace> BaseDaemon: Received signal 11
2024.05.17 18:19:08.858831 [ 654577 ] {} <Fatal> BaseDaemon: ########## Short fault info ############
2024.05.17 18:19:08.858869 [ 654577 ] {} <Fatal> BaseDaemon: (version 24.5.1.1319, build id: 74425E4A0E7BAAAA3476E408EB65BE274E861C1B, git hash: b4f4001924a7028dc3d0ac038afe87974e5c24d3) (from thread 652743) Received signal 11
2024.05.17 18:19:08.858893 [ 654577 ] {} <Fatal> BaseDaemon: Signal description: Segmentation fault
2024.05.17 18:19:08.858910 [ 654577 ] {} <Fatal> BaseDaemon: Address: 0x3e80009fcf0. Access: . Unknown si_code.
2024.05.17 18:19:08.858928 [ 654577 ] {} <Fatal> BaseDaemon: Stack trace: 0x0000f4c01cec9dfc 0x0000f4c01cecc8fc 0x0000c8976279bf50 0x0000c897626b7d2c 0x0000c897693c039c 0x0000c897626a8b60 0x0000c897693c9598 0x0000c897626a6180 0x0000c8975e16ee9c 0x0000f4c01ce773fc 0x0000f4c01ce774cc
2024.05.17 18:19:08.858944 [ 654577 ] {} <Fatal> BaseDaemon: ########################################
2024.05.17 18:19:08.858960 [ 654577 ] {} <Fatal> BaseDaemon: (version 24.5.1.1319, build id: 74425E4A0E7BAAAA3476E408EB65BE274E861C1B, git hash: b4f4001924a7028dc3d0ac038afe87974e5c24d3) (from thread 652743) (no query) Received signal Segmentation fault (11)
2024.05.17 18:19:08.858975 [ 654577 ] {} <Fatal> BaseDaemon: Address: 0x3e80009fcf0. Access: . Unknown si_code.
2024.05.17 18:19:08.858988 [ 654577 ] {} <Fatal> BaseDaemon: Stack trace: 0x0000f4c01cec9dfc 0x0000f4c01cecc8fc 0x0000c8976279bf50 0x0000c897626b7d2c 0x0000c897693c039c 0x0000c897626a8b60 0x0000c897693c9598 0x0000c897626a6180 0x0000c8975e16ee9c 0x0000f4c01ce773fc 0x0000f4c01ce774cc
2024.05.17 18:19:08.859032 [ 654577 ] {} <Fatal> BaseDaemon: 2. ? @ 0x0000000000079dfc
2024.05.17 18:19:08.859047 [ 654577 ] {} <Fatal> BaseDaemon: 3. ? @ 0x000000000007c8fc
2024.05.17 18:19:08.884167 [ 654577 ] {} <Fatal> BaseDaemon: 4.0. inlined from ./contrib/llvm-project/libcxx/src/condition_variable.cpp:47: std::condition_variable::wait(std::unique_lock<std::mutex>&)
2024.05.17 18:19:08.884201 [ 654577 ] {} <Fatal> BaseDaemon: 4.1. inlined from ./contrib/llvm-project/libcxx/include/__mutex_base:398: void std::condition_variable::wait<BaseDaemon::waitForTerminationRequest()::$_0>(std::unique_lock<std::mutex>&, BaseDaemon::waitForTerminationRequest()::$_0)
2024.05.17 18:19:08.884218 [ 654577 ] {} <Fatal> BaseDaemon: 4. ./build_docker/./src/Daemon/BaseDaemon.cpp:1164: BaseDaemon::waitForTerminationRequest() @ 0x000000000beabf50
2024.05.17 18:19:08.906990 [ 654577 ] {} <Fatal> BaseDaemon: 5.0. inlined from ./contrib/llvm-project/libcxx/include/vector:434: ~vector
2024.05.17 18:19:08.907020 [ 654577 ] {} <Fatal> BaseDaemon: 5. ./build_docker/./programs/server/Server.cpp:2210: DB::Server::main(std::vector<String, std::allocator<String>> const&) @ 0x000000000bdc7d2c
2024.05.17 18:19:08.911238 [ 654577 ] {} <Fatal> BaseDaemon: 6. ./build_docker/./base/poco/Util/src/Application.cpp:0: Poco::Util::Application::run() @ 0x0000000012ad039c
2024.05.17 18:19:08.947799 [ 654577 ] {} <Fatal> BaseDaemon: 7. ./build_docker/./programs/server/Server.cpp:420: DB::Server::run() @ 0x000000000bdb8b60
2024.05.17 18:19:08.949250 [ 654577 ] {} <Fatal> BaseDaemon: 8. ./build_docker/./base/poco/Util/src/ServerApplication.cpp:132: Poco::Util::ServerApplication::run(int, char**) @ 0x0000000012ad9598
2024.05.17 18:19:08.984759 [ 654577 ] {} <Fatal> BaseDaemon: 9. ./build_docker/./programs/server/Server.cpp:0: mainEntryClickHouseServer(int, char**) @ 0x000000000bdb6180
2024.05.17 18:19:08.986478 [ 654577 ] {} <Fatal> BaseDaemon: 10.0. inlined from ./contrib/llvm-project/libcxx/include/vector:434: ~vector
2024.05.17 18:19:08.986496 [ 654577 ] {} <Fatal> BaseDaemon: 10. ./build_docker/./programs/main.cpp:509: main @ 0x000000000787ee9c
2024.05.17 18:19:08.986514 [ 654577 ] {} <Fatal> BaseDaemon: 11. ? @ 0x00000000000273fc
2024.05.17 18:19:08.986529 [ 654577 ] {} <Fatal> BaseDaemon: 12. ? @ 0x00000000000274cc
2024.05.17 18:19:08.986542 [ 654577 ] {} <Fatal> BaseDaemon: Integrity check of the executable skipped because the reference checksum could not be read.
2024.05.17 18:19:08.986557 [ 654577 ] {} <Information> SentryWriter: Not sending crash report
2024.05.17 18:19:08.986570 [ 654577 ] {} <Fatal> BaseDaemon: This ClickHouse version is not official and should be upgraded to the official build.
2024.05.17 18:19:09.000781 [ 653724 ] {} <Trace> AsynchronousMetrics: MemoryTracking: was 297.75 MiB, peak 297.76 MiB, free memory in arenas 1.30 MiB, will set to 358.50 MiB (RSS), difference: 60.75 MiB
2024.05.17 18:19:11.124811 [ 653659 ] {} <Trace> CgroupsMemoryUsageObserver: Read current memory usage 25.88 GiB from cgroups
2024.05.17 18:19:11.127802 [ 653110 ] {} <Debug> DNSResolver: Updating DNS cache
2024.05.17 18:19:11.127843 [ 653110 ] {} <Debug> DNSResolver: Updated DNS cache

@azat
Copy link
Collaborator Author

azat commented May 18, 2024

ClickHouse build check — 16/18 artifact groups are OK

TSan

May 17 18:34:12 FAILED: contrib/arrow-cmake/orc_proto.pb.h contrib/arrow-cmake/orc_proto.pb.cc /build/build_docker/contrib/arrow-cmake/orc_proto.pb.h /build/build_docker/contrib/arrow-cmake/orc_proto.pb.cc 
May 17 18:34:12 cd /build/build_docker/contrib/arrow-cmake && /build/build_docker/contrib/google-protobuf-cmake/protoc -I /build/contrib/orc/c++/../proto --cpp_out="/build/build_docker/contrib/arrow-cmake" /build/contrib/orc/c++/../proto/orc_proto.proto
May 17 18:34:12 ThreadSanitizer: CHECK failed: tsan_platform_linux.cpp:282 "((personality(old_personality | ADDR_NO_RANDOMIZE))) != ((-1))" (0xffffffffffffffff, 0xffffffffffffffff) (tid=123304)
May 17 18:34:12 Segmentation fault (core dumped)

MSan

May 17 18:31:34 [3061/12202] Generating orc_proto.pb.h, orc_proto.pb.cc
May 17 18:31:34 FAILED: contrib/arrow-cmake/orc_proto.pb.h contrib/arrow-cmake/orc_proto.pb.cc /build/build_docker/contrib/arrow-cmake/orc_proto.pb.h /build/build_docker/contrib/arrow-cmake/orc_proto.pb.cc 
May 17 18:31:34 cd /build/build_docker/contrib/arrow-cmake && /build/build_docker/contrib/google-protobuf-cmake/protoc -I /build/contrib/orc/c++/../proto --cpp_out="/build/build_docker/contrib/arrow-cmake" /build/contrib/orc/c++/../proto/orc_proto.proto
May 17 18:31:34 FATAL: Code 0x603049277240 is out of application range. Non-PIE build?
May 17 18:31:34 FATAL: MemorySanitizer can not mmap the shadow memory.
May 17 18:31:34 FATAL: Make sure to compile with -fPIE and to link with -pie.
May 17 18:31:34 FATAL: Disabling ASLR is known to cause this error.
May 17 18:31:34 FATAL: If running under GDB, try 'set disable-randomization off'.
May 17 18:31:34 ==110628==Process memory map follows:
May 17 18:31:34 	0x603049036000-0x60304925a000	/build/build_docker/contrib/google-protobuf-cmake/protoc
May 17 18:31:34 	0x60304925a000-0x60304a10a000	/build/build_docker/contrib/google-protobuf-cmake/protoc
May 17 18:31:34 	0x60304a10a000-0x60304a121000	/build/build_docker/contrib/google-protobuf-cmake/protoc
May 17 18:31:34 	0x60304a121000-0x60304a128000	/build/build_docker/contrib/google-protobuf-cmake/protoc
May 17 18:31:34 	0x60304a128000-0x60304bb14000	
May 17 18:31:34 	0x7e0b4d700000-0x7e0b4d800000	
May 17 18:31:34 	0x7e0b4d900000-0x7e0b4da00000	
May 17 18:31:34 	0x7e0b4db00000-0x7e0b4dc00000	
May 17 18:31:34 	0x7e0b4dd00000-0x7e0b4de00000	
May 17 18:31:34 	0x7e0b4dee3000-0x7e0b4e291000	
May 17 18:31:34 	0x7e0b4e291000-0x7e0b4e292000	/usr/lib/x86_64-linux-gnu/libdl.so.2
May 17 18:31:34 	0x7e0b4e292000-0x7e0b4e293000	/usr/lib/x86_64-linux-gnu/libdl.so.2
May 17 18:31:34 	0x7e0b4e293000-0x7e0b4e294000	/usr/lib/x86_64-linux-gnu/libdl.so.2
May 17 18:31:34 	0x7e0b4e294000-0x7e0b4e295000	/usr/lib/x86_64-linux-gnu/libdl.so.2
May 17 18:31:34 	0x7e0b4e295000-0x7e0b4e296000	/usr/lib/x86_64-linux-gnu/libdl.so.2
May 17 18:31:34 	0x7e0b4e296000-0x7e0b4e297000	/usr/lib/x86_64-linux-gnu/librt.so.1
May 17 18:31:34 	0x7e0b4e297000-0x7e0b4e298000	/usr/lib/x86_64-linux-gnu/librt.so.1
May 17 18:31:34 	0x7e0b4e298000-0x7e0b4e299000	/usr/lib/x86_64-linux-gnu/librt.so.1
May 17 18:31:34 	0x7e0b4e299000-0x7e0b4e29a000	/usr/lib/x86_64-linux-gnu/librt.so.1
May 17 18:31:34 	0x7e0b4e29a000-0x7e0b4e29b000	/usr/lib/x86_64-linux-gnu/librt.so.1
May 17 18:31:34 	0x7e0b4e29b000-0x7e0b4e2a9000	/usr/lib/x86_64-linux-gnu/libm.so.6
May 17 18:31:34 	0x7e0b4e2a9000-0x7e0b4e325000	/usr/lib/x86_64-linux-gnu/libm.so.6
May 17 18:31:34 	0x7e0b4e325000-0x7e0b4e380000	/usr/lib/x86_64-linux-gnu/libm.so.6
May 17 18:31:34 	0x7e0b4e380000-0x7e0b4e381000	/usr/lib/x86_64-linux-gnu/libm.so.6
May 17 18:31:34 	0x7e0b4e381000-0x7e0b4e382000	/usr/lib/x86_64-linux-gnu/libm.so.6
May 17 18:31:34 	0x7e0b4e382000-0x7e0b4e3aa000	/usr/lib/x86_64-linux-gnu/libc.so.6
May 17 18:31:34 	0x7e0b4e3aa000-0x7e0b4e53f000	/usr/lib/x86_64-linux-gnu/libc.so.6
May 17 18:31:34 	0x7e0b4e53f000-0x7e0b4e597000	/usr/lib/x86_64-linux-gnu/libc.so.6
May 17 18:31:34 	0x7e0b4e597000-0x7e0b4e598000	/usr/lib/x86_64-linux-gnu/libc.so.6
May 17 18:31:34 	0x7e0b4e598000-0x7e0b4e59c000	/usr/lib/x86_64-linux-gnu/libc.so.6
May 17 18:31:34 	0x7e0b4e59c000-0x7e0b4e59e000	/usr/lib/x86_64-linux-gnu/libc.so.6
May 17 18:31:34 	0x7e0b4e59e000-0x7e0b4e5ab000	
May 17 18:31:34 	0x7e0b4e5ab000-0x7e0b4e5ac000	/usr/lib/x86_64-linux-gnu/libpthread.so.0
May 17 18:31:34 	0x7e0b4e5ac000-0x7e0b4e5ad000	/usr/lib/x86_64-linux-gnu/libpthread.so.0
May 17 18:31:34 	0x7e0b4e5ad000-0x7e0b4e5ae000	/usr/lib/x86_64-linux-gnu/libpthread.so.0
May 17 18:31:34 	0x7e0b4e5ae000-0x7e0b4e5af000	/usr/lib/x86_64-linux-gnu/libpthread.so.0
May 17 18:31:34 	0x7e0b4e5af000-0x7e0b4e5b0000	/usr/lib/x86_64-linux-gnu/libpthread.so.0
May 17 18:31:34 	0x7e0b4e5b1000-0x7e0b4e5b7000	
May 17 18:31:34 	0x7e0b4e5b7000-0x7e0b4e5b9000	/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
May 17 18:31:34 	0x7e0b4e5b9000-0x7e0b4e5e3000	/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
May 17 18:31:34 	0x7e0b4e5e3000-0x7e0b4e5ee000	/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
May 17 18:31:34 	0x7e0b4e5ee000-0x7e0b4e5ef000	
May 17 18:31:34 	0x7e0b4e5ef000-0x7e0b4e5f1000	/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
May 17 18:31:34 	0x7e0b4e5f1000-0x7e0b4e5f3000	/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
May 17 18:31:34 	0x7ffd73396000-0x7ffd733b7000	[stack]
May 17 18:31:34 	0x7ffd733c1000-0x7ffd733c5000	[vvar]
May 17 18:31:34 	0x7ffd733c5000-0x7ffd733c7000	[vdso]
May 17 18:31:34 	0xffffffffff600000-0xffffffffff601000	[vsyscall]
May 17 18:31:34 ==110628==End of process memory map.

@alesapin
Copy link
Member

@azat let's rerun, I've added

#job_style_check
#job_Integration_tests_aarch64
#job_Integration_tests_release

To pr description.

azat added 2 commits May 19, 2024 08:11
Only SIGSEGV uses alternative stack (sigaltstack()), which is very
small, 16K, and for aarch64 it is likely not enough for unwinding
(likely due to lots of registers on this platform):

    (gdb) bt
    #0  libunwind::CFI_Parser<libunwind::LocalAddressSpace>::parseFDEInstructions (addressSpace=..., fdeInfo=..., cieInfo=..., upToPC=<optimized out>, arch=4, results=<optimized out>) at ./contrib/libunwind/src/DwarfParser.hpp:561

And this is:

    554       case DW_CFA_remember_state: {
    555         // Avoid operator new because that would be an upward dependency.
    556         // Avoid malloc because it needs heap allocation.
    557         PrologInfoStackEntry *entry =
    558             (PrologInfoStackEntry *)_LIBUNWIND_REMEMBER_ALLOC(
    559                 sizeof(PrologInfoStackEntry));
    560         if (entry != NULL) {
    561           entry->next = rememberStack.entry;
    ^^^
    562           entry->info = *results;
    563           rememberStack.entry = entry;
    564         } else {
    565           return false;
    566         }
    567         _LIBUNWIND_TRACE_DWARF("DW_CFA_remember_state\n");
    568         break;
    569       }

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
@azat
Copy link
Collaborator Author

azat commented May 19, 2024

Integration tests (aarch64) [3/6] — fail: 0, passed: 323

  • test_send_crash_reports/test.py::test_send_segfault - passed

@alesapin alesapin added this pull request to the merge queue May 19, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks May 19, 2024
@alexey-milovidov alexey-milovidov merged commit 039b385 into ClickHouse:master May 20, 2024
33 of 38 checks passed
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label May 20, 2024
@azat azat deleted the fix-unwind-on-aarch64 branch May 20, 2024 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Not a bug: libunwind: aarch64: SIGSEGV signal handler leads to another seg fault
6 participants