Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ghcup binary built on i386/alpine-3.16 with GHC-9.4.8 segfaults (statically linked) #962

Open
hasufell opened this issue Jan 2, 2024 · 14 comments

Comments

@hasufell
Copy link
Member

hasufell commented Jan 2, 2024

The bindist used is: https://downloads.haskell.org/~ghcup/unofficial-bindists/ghc/9.4.8/ghc-9.4.8-i386-alpine-linux.tar.xz

It is a dynamically linked bindist, because the static ones are more broken.

The settings file is as follows
[("GCC extra via C opts", "")
,("C compiler command", "gcc")
,("C compiler flags", "-U__i686")
,("C++ compiler command", "g++")
,("C++ compiler flags", "")
,("C compiler link flags", "")
,("C compiler supports -no-pie", "YES")
,("Haskell CPP command", "gcc")
,("Haskell CPP flags", "-E -undef -traditional")
,("ld command", "ld")
,("ld flags", "")
,("ld supports compact unwind", "NO")
,("ld supports build-id", "YES")
,("ld supports filelist", "NO")
,("ld is GNU ld", "YES")
,("Merge objects command", "ld")
,("Merge objects flags", "-r")
,("ar command", "ar")
,("ar flags", "q")
,("ar supports at file", "YES")
,("ar supports -L", "NO")
,("ranlib command", "ranlib")
,("otool command", "otool")
,("install_name_tool command", "install_name_tool")
,("touch command", "touch")
,("dllwrap command", "/bin/false")
,("windres command", "/bin/false")
,("libtool command", "libtool")
,("unlit command", "$topdir/bin/unlit")
,("cross compiling", "NO")
,("target platform string", "i386-unknown-linux")
,("target os", "OSLinux")
,("target arch", "ArchX86")
,("target word size", "4")
,("target word big endian", "NO")
,("target has GNU nonexec stack", "YES")
,("target has .ident directive", "YES")
,("target has subsections via symbols", "NO")
,("target has RTS linker", "YES")
,("target has libm", "YES")
,("Unregisterised", "NO")
,("LLVM target", "i686-unknown-linux")
,("LLVM llc command", "llc")
,("LLVM opt command", "opt")
,("LLVM clang command", "clang")
,("Use inplace MinGW toolchain", "NO")
,("Use interpreter", "YES")
,("Support SMP", "YES")
,("RTS ways", "debug thr thr_debug thr_p dyn debug_dyn thr_dyn thr_debug_dyn thr_debug_p debug_p")
,("Tables next to code", "YES")
,("Leading underscore", "NO")
,("Use LibFFI", "NO")
,("RTS expects libdw", "NO")
]

The linker is indeed ld.bfd, so I believe we're not hitting https://gitlab.haskell.org/ghc/ghc/-/issues/17508

The backtrace isn't very useful

checking for gcc option to enable C11 features... none needed
checking for g++... g++
[New LWP 172353]

Thread 6 "ghcup:w" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 172051]
0xf7f75ec8 in __stack_chk_fail () from /lib/ld-musl-i386.so.1
(gdb) bt
#0  0xf7f75ec8 in __stack_chk_fail () from /lib/ld-musl-i386.so.1
#1  0x0a99d140 in freeHaskellFunctionPtr (ptr=0x4000000c) at rts/adjustor/Nativei386.c:160
#2  0x08891886 in ?? ()
#3  0xf779f441 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

The ptr=0x4000000c in frame 1 is reproducible/stable.

plan.json of the ghcup build is here: https://gist.github.com/hasufell/22433e37c7694ff758f63d6dbbf1421f

CI run uncovering the issue: https://github.com/haskell/ghcup-hs/actions/runs/7385090589/job/20090951278#step:6:454

I'm not sure this is enough to create a GHC bug report, especially since it's an unofficial bindist compiled manually. It's possible this is also just another alpine f**kup. Their toolchain has not been great lately, which is why I've still been on 3.12 with older GHCs.


@bgamari @runeksvendsen @angerman

@hasufell hasufell changed the title ghcup binary built on alpine-3.16 with GHC-9.4.8 segfaults ghcup binary built on i386/alpine-3.16 with GHC-9.4.8 segfaults Jan 2, 2024
@hasufell hasufell changed the title ghcup binary built on i386/alpine-3.16 with GHC-9.4.8 segfaults ghcup binary built on i386/alpine-3.16 with GHC-9.4.8 segfaults (statically linked) Jan 2, 2024
@hasufell
Copy link
Member Author

hasufell commented Jan 2, 2024

Wonder if there might be a correlation to https://gitlab.haskell.org/ghc/ghc/-/commit/9b645ee1a9fff64b66b36dc73d8809ff82025829 which is part of 9.4.8.

@hasufell
Copy link
Member Author

hasufell commented Jan 2, 2024

Tried with alpine-3.16, no change.

@runeksvendsen
Copy link
Collaborator

runeksvendsen commented Jan 21, 2024

@hasufell wrt. bisecting this bug, what is the latest known-good version of GHC prior to 9.4.8? I see CI uses 8.10.7; is that it?

Also, do you know if this issue still exists with latest GHC, or do we have a known-good GHC version that is later than 9.4.8?

@hasufell
Copy link
Member Author

There is not much known.

runeksvendsen added a commit that referenced this issue Jan 21, 2024
Attempt to reproduce #962

This reverts commit 1c56e78.
@runeksvendsen
Copy link
Collaborator

Before my EOD Sunday I fired off some CI runs with various GHC versions to make a coarse bisect. Result:

  • 9.0.1 SUCCESS
  • 9.0.2 SUCCESS
  • 9.2.1 SUCCESS
  • 9.2.8 unrelated failure
    • [ Info  ] Installing GHC (this may take a while)
      [ Info  ] Detected alpine linux... setting LD=ld.bfd
      [ Info  ] Merging file tree from "/github/workspace/.ghcup/tmp/ghcup-faf6f677/github/workspace/.ghcup/ghc/9.2.8" to "/github/workspace/.ghcup/ghc/9.2.8"
      [ Error ] [�
      [ ...   ] exception was: /github/workspace/.ghcup/ghc/9.2.8/lib/ghc-9.2.8/lib/settings: setModificationTime:setFileTimes: invalid argument (Invalid argument) 
      
  • 9.6.3 fails to build
  • 9.8.1 fails to build

Would be nice to test more finely around 9.4.* and 9.2.* but there aren't many bindists available for alpine/i386:

$ ghcup list --platform "i386-alpine-linux" |grep -w "ghc"|grep -v "no-bindist"
✗  ghc   8.0.2      base-4.9.1.0
✗  ghc   8.2.2      base-4.10.1.0
✗  ghc   8.4.4      base-4.11.1.0
✓  ghc   8.6.5      base-4.12.0.0
✗  ghc   8.8.4      base-4.13.0.0
✗  ghc   8.10.1     base-4.14.0.0
✗  ghc   8.10.2     base-4.14.1.0
✗  ghc   8.10.3     base-4.14.1.0
✗  ghc   8.10.4     base-4.14.1.0
✗  ghc   8.10.5     base-4.14.2.0
✗  ghc   8.10.6     base-4.14.3.0
✓  ghc   8.10.7     base-4.14.3.0
✗  ghc   9.0.1      base-4.15.0.0
✓  ghc   9.0.2      base-4.15.1.0
✗  ghc   9.2.1      base-4.16.0.0
✔✔ ghc   9.2.8      base-4.16.4.0             hls-powered
✗  ghc   9.4.8      recommended,base-4.17.2.1 hls-powered
✗  ghc   9.6.3      base-4.18.1.0             hls-powered
✗  ghc   9.8.1      latest,base-4.19.0.0      hls-powered

@runeksvendsen
Copy link
Collaborator

@hasufell I'm trying to reproduce this locally. But I'm unable to locally run the script that reproduces the issue in CI because the script needs the S3_HOST env var to be set. In CI it's set to a secret that I don't have.

Here's the command I'm running:

cd /path/to/ghcup-repo
docker run --platform "linux/386" -v $(pwd):/src -it -e RUNNER_OS=Linux -e GITHUB_WORKSPACE=/src -e ARCH=32 -e GHC_VER=9.4.8 -e CABAL_VER=3.10.2.0 hasufell/i386-alpine-haskell:3.16 sh -c "cd /src && sh .github/scripts/build.sh"

It's failing with the error .github/scripts/build.sh: line 18: S3_HOST: parameter not set.

@runeksvendsen
Copy link
Collaborator

Command for reproducing locally:

  1. Run a docker container with the following command from within the ghcup source repo: git checkout 1b50c26c888f71041f5becc2c70e27446cc8f5c2 && docker run --platform "linux/386" -v $(pwd):/src -it hasufell/i386-alpine-haskell:3.16 /bin/bash -c "cd /src; cabal update; exec sh"
  2. Inside this docker container run the command cabal --project-file=cabal.project.release run exe:ghcup -- install ghc 9.4.8

Logs:

<... building dependencies ...>

Preprocessing library for ghcup-0.1.20.0..
Building library for ghcup-0.1.20.0..
/usr/lib/gcc/i586-alpine-linux-musl/11.2.1/../../../../i586-alpine-linux-musl/bin/ld: /usr/lib/gcc/i586-alpine-linux-musl/11.2.1/crtbeginT.o: warning: relocation in read-only section `.text'
/usr/lib/gcc/i586-alpine-linux-musl/11.2.1/../../../../i586-alpine-linux-musl/bin/ld: warning: creating DT_TEXTREL in a shared object
Preprocessing library 'ghcup-optparse' for ghcup-0.1.20.0..
Building library 'ghcup-optparse' for ghcup-0.1.20.0..
/usr/lib/gcc/i586-alpine-linux-musl/11.2.1/../../../../i586-alpine-linux-musl/bin/ld: /usr/lib/gcc/i586-alpine-linux-musl/11.2.1/crtbeginT.o: warning: relocation in read-only section `.text'
/usr/lib/gcc/i586-alpine-linux-musl/11.2.1/../../../../i586-alpine-linux-musl/bin/ld: warning: creating DT_TEXTREL in a shared object
Preprocessing executable 'ghcup' for ghcup-0.1.20.0..
Building executable 'ghcup' for ghcup-0.1.20.0..
[3 of 3] Linking /src/dist-newstyle/build/i386-linux/ghc-9.4.8/ghcup-0.1.20.0/x/ghcup/opt/build/ghcup/ghcup [Library changed]
[ Info  ] downloading: https://raw.githubusercontent.com/haskell/ghcup-metadata/master/ghcup-0.0.8.yaml as file /root/.ghcup/cache/ghcup-0.0.8.yaml
[ Info  ] verifying signature of: /root/.ghcup/cache/ghcup-0.0.8.yaml
[ Info  ] downloading: https://downloads.haskell.org/~ghcup/unofficial-bindists/ghc/9.4.8/ghc-9.4.8-i386-alpine-linux.tar.xz as file /root/.ghcup/tmp/ghcup-660021a7/ghc-9.4.8-i386-alpine-linux.tar.xz
[ Info  ] verifying digest of: ghc-9.4.8-i386-alpine-linux.tar.xz
[ Info  ] Unpacking: ghc-9.4.8-i386-alpine-linux.tar.xz to /root/.ghcup/tmp/ghcup-9f754810
[ Info  ] Installing GHC (this may take a while)
[ Info  ] Detected alpine linux... setting LD=ld.bfd
[ ghc-make ] Installing ./hpc -> /root/.ghcup/tmp/ghcup-2d949119//root/.ghcup/ghc/9....
[ ghc-make ] Installing ./ghci -> /root/.ghcup/tmp/ghcup-2d949119//root/.ghcup/ghc/9...
[ ghc-make ] Installing ./hp2ps-ghc-9.4.8 -> /root/.ghcup/tmp/ghcup-2d949119//root/....
[ ghc-make ] 
[ ghc-make ] Copying libraries to /root/.ghcup/tmp/ghcup-2d949119/root/.ghcup/ghc/9....
[ ghc-make ] /usr/bin/install -c -m 755 -d "/root/.ghcup/tmp/ghcup-2d949119/root/.gh...
Segmentation fault (core dumped)

@runeksvendsen
Copy link
Collaborator

Verbose logs:

[ Debug ] Identified Platform as: Linux Alpine, 3.16.8
[ Debug ] last access was 2595402432930383003.085091733s ago, cache interval is 300s
[ Info  ] downloading: https://raw.githubusercontent.com/haskell/ghcup-metadata/master/ghcup-0.0.8.yaml as file /root/.ghcup/cache/ghcup-0.0.8.yaml
[ Debug ] Read etag: "19c93efb16e5abf1e3da5bee59e9f4ec68eabc43fc8b1dd035022559d6a05ade"
[ Debug ] Status code was 304, not overwriting
[ Debug ] Parsed etag: "19c93efb16e5abf1e3da5bee59e9f4ec68eabc43fc8b1dd035022559d6a05ade"
[ Debug ] Writing etagsFile /root/.ghcup/cache/ghcup-0.0.8.yaml.etags
[ Debug ] downloading: https://raw.githubusercontent.com/haskell/ghcup-metadata/master/ghcup-0.0.8.yaml.sig as file /root/.ghcup/cache/ghcup-0.0.8.yaml.sig
[ Debug ] Read etag: "d7502365275454e7967d481e981e2c27e263edb88bf3d381c6aa66a2016d99ee"
[ Debug ] Status code was 304, not overwriting
[ Debug ] Parsed etag: "d7502365275454e7967d481e981e2c27e263edb88bf3d381c6aa66a2016d99ee"
[ Debug ] Writing etagsFile /root/.ghcup/cache/ghcup-0.0.8.yaml.sig.etags
[ Info  ] verifying signature of: /root/.ghcup/cache/ghcup-0.0.8.yaml
[ Debug ] gpg: Signature made Fri Jan 26 06:58:33 2024 UTC
[ ...   ] gpg:                using RSA key ECA44F5A172EDAD947F39E3D4275CDA6A29BED43
[ ...   ] gpg:                issuer "hasufell@posteo.de"
[ ...   ] gpg: Good signature from "Julian Ospald <maerwald@hasufell.de>" [unknown]
[ ...   ] gpg:                 aka "Julian Ospald <hasufell@hasufell.de>" [unknown]
[ ...   ] gpg:                 aka "Julian Ospald <hasufell@posteo.de>" [unknown]
[ ...   ] gpg: WARNING: This key is not certified with a trusted signature!
[ ...   ] gpg:          There is no indication that the signature belongs to the owner.
[ ...   ] Primary key fingerprint: 7D1E 8AFD 1D4A 16D7 1FAD  A2F2 CCC8 5C0E 40C0 6A8C
[ ...   ]      Subkey fingerprint: ECA4 4F5A 172E DAD9 47F3  9E3D 4275 CDA6 A29B ED43
[ Debug ] Decoding yaml at: /root/.ghcup/cache/ghcup-0.0.8.yaml
[ Debug ] Requested to install GHC with 9.4.8
[ Info  ] downloading: https://downloads.haskell.org/~ghcup/unofficial-bindists/ghc/9.4.8/ghc-9.4.8-i386-alpine-linux.tar.xz as file /root/.ghcup/tmp/ghcup-be39f0c9/ghc-9.4.8-i386-alpine-linux.tar.xz
[ Info  ] verifying digest of: ghc-9.4.8-i386-alpine-linux.tar.xz
[ Info  ] Unpacking: ghc-9.4.8-i386-alpine-linux.tar.xz to /root/.ghcup/tmp/ghcup-e23e3bc0
[ Info  ] Installing GHC (this may take a while)
[ Info  ] Detected alpine linux... setting LD=ld.bfd
[ Debug ] Running sh with arguments ["./configure","--prefix=/root/.ghcup/ghc/9.4.8","--disable-ld-override"]
checking build system type... i686-pc-linux-musl
checking host system type... i686-pc-linux-musl
checking target system type... i686-pc-linux-musl
build platform inferred as: i386-unknown-linux
host platform inferred as: i386-unknown-linux
target platform inferred as: i386-unknown-linux
configure: GHC build : i386-unknown-linux
configure: GHC host : i386-unknown-linux
configure: GHC target : i386-unknown-linux
checking for path to top of build tree... /root/.ghcup/tmp/ghcup-e23e3bc0/ghc-9.4.8-i386-unknown-linux
checking for a BSD-compatible install... /usr/bin/install -c
checking whether ln -s works... yes
checking for gsed... sed
checking for python3... no
checking for gfind... no
checking for find... /usr/bin/find
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether the compiler supports GNU C... yes
checking whether gcc accepts -g... yes
checking for gcc option to enable C11 features... none needed
checking for g++... g++
checking whether the compiler supports GNU C++... yes
checking whether g++ accepts -g... yes
checking for g++ option to enable C++11 features... none needed
checking how to run the C preprocessor... gcc -E
checking for -ld... ld.bfd
checking for ld.gold object merging bug (binutils 22266)... checking whether ld is GNU ld... YES
checking whether ld understands --build-id... yes
checking whether ld understands -no_compact_unwind... no
checking whether ld understands -filelist... no
checking for -strip... no
checking for strip... strip
checking for gawk... no
checking for mawk... no
checking for nawk... no
checking for awk... awk
checking for llc-15... no
checking for llc-15.0... no
checking for llc15... no
checking for llc-14... no
checking for llc-14.0... no
checking for llc14... no
checking for llc-13... no
checking for llc-13.0... no
checking for llc13... no
checking for llc-12... no
checking for llc-12.0... no
checking for llc12... no
checking for llc-11... no
checking for llc-11.0... no
checking for llc11... no
checking for llc-10... no
checking for llc-10.0... no
checking for llc10... no
checking for llc... no
checking for opt-15... no
checking for opt-15.0... no
checking for opt15... no
checking for opt-14... no
checking for opt-14.0... no
checking for opt14... no
checking for opt-13... no
checking for opt-13.0... no
checking for opt13... no
checking for opt-12... no
checking for opt-12.0... no
checking for opt12... no
checking for opt-11... no
checking for opt-11.0... no
checking for opt11... no
checking for opt-10... no
checking for opt-10.0... no
checking for opt10... no
checking for opt... no
checking version of gcc... checking version of gcc... 11.2.1
11.2.1
checking whether CC supports -no-pie... yes
checking for extra options to pass gcc when compiling via C... 
checking Setting up CFLAGS, LDFLAGS, IGNORE_LINKER_LD_FLAGS and CPPFLAGS... done
checking Setting up CONF_CC_OPTS_STAGE0, CONF_GCC_LINKER_OPTS_STAGE0, CONF_LD_LINKER_OPTS_STAGE0 and CONF_CPP_OPTS_STAGE0... done
checking Setting up CONF_CC_OPTS_STAGE1, CONF_GCC_LINKER_OPTS_STAGE1, CONF_LD_LINKER_OPTS_STAGE1 and CONF_CPP_OPTS_STAGE1... done
checking Setting up CONF_CC_OPTS_STAGE2, CONF_GCC_LINKER_OPTS_STAGE2, CONF_LD_LINKER_OPTS_STAGE2 and CONF_CPP_OPTS_STAGE2... done
checking C++ standard library flavour... libstdc++
Segmentation fault (core dumped)

@runeksvendsen
Copy link
Collaborator

So, it appears executing the configure script makes GHCup segfault. In an attempt to arrive at a more minimal example, I tried using the process library (instead of the unix library) to execute the configure script with the same args and cwd. Result is that this completes successfully (no segfault).

Next, I plan to see if I can trigger the segfault using only System.Posix.Process.executeFile (ie. without the other logic in execLogged).

@runeksvendsen
Copy link
Collaborator

Update: as stated above, I assumed there was a problem with the configure script, as the segfault seemed to occur when running this script. So, in an attempt to arrive at more minimal reproduction, I wanted to see if running the configure script manually with the same arguments and CWD also produces a segfault.

To accomplish this I made this commit 8cb52da, which prints the CWD and configure script command to run and then sleeps, so that it can be run manually in a terminal.

Surprisingly, however, this results in the threadDelay call causing a segfault. See CI logs for the commit here: https://github.com/haskell/ghcup-hs/actions/runs/8017547231/job/21902222326. First it logs "about to sleep" and then the segfault happens before printing "done sleeping", which means it happens at the threadDelay here: https://github.com/haskell/ghcup-hs/pull/985/files#diff-3b9b75e85b8e396244d24bf04ef933cae33b4d20df6f5889304bfb36696c43d6R480-R481 (unless the "done sleeping" log output is buffered somehow).

So it appears the relation to the configure script was a red herring.

@hasufell
Copy link
Member Author

So a minimal program calling threadDelay segfaults?

@angerman
Copy link
Collaborator

Next question would be if this happens with -threaded or not, as well.

@dfordivam
Copy link
Contributor

This crash is unrelated to threadDelay, and it happens when freeing memory due to Codec.Archive.Internal.Unpack.Lazy.unpackToDirLazy.
The crash is also reproducible by doing installation of cabal via tui, which succeeds and installs the cabal, but still causes the GHCup to crash at the "Press enter to continue" prompt.

Thread 5 "ghcup:w" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 105095]
__stack_chk_fail () at src/env/__stack_chk_fail.c:26
warning: 26     src/env/__stack_chk_fail.c: No such file or directory
(gdb) bt
#0  __stack_chk_fail () at src/env/__stack_chk_fail.c:26
#1  0x094968f0 in freeHaskellFunctionPtr (ptr=0x4000000c) at rts/adjustor/Nativei386.c:160
#2  0x08550482 in s1svn_info () at src/Codec/Archive/Internal/Unpack/Lazy.hs:69

By adding a delay just after doing unpack of the tar I could reproduce the crash in unpackToDir

diff --git a/lib/GHCup/Utils/Tar.hs b/lib/GHCup/Utils/Tar.hs
index 4d179740..64564346 100644
--- a/lib/GHCup/Utils/Tar.hs
+++ b/lib/GHCup/Utils/Tar.hs
@@ -14,6 +14,7 @@ Portability : portable
 -}
 module GHCup.Utils.Tar where
 
+import Control.Concurrent
 import           GHCup.Utils.Tar.Types ( ArchiveResult(..) )
 import           GHCup.Errors
 import           GHCup.Prelude
@@ -88,6 +89,10 @@ unpackToDir dfp av = do
 #endif
     | otherwise -> throwE $ UnknownArchive fn
 
+  lift $ logInfo $ "Unpacking Done: Doing Delay"
+  liftIO $ threadDelay (10*1000*1000)
+  lift $ logInfo $ "Unpacking Done: After Delay"
+
 
 -- | Get all files from an archive.
 getArchiveFiles :: (MonadReader env m, HasLog env, MonadIO m, MonadThrow m)

@hasufell
Copy link
Member Author

I can try and build an i386 binary that uses Hasell tar instead of libarchive, as a workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants