Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T crashes during peak sessions with "fatal error: unexpected signal during runtime execution" #2817

Open
0xVires opened this issue Jun 23, 2023 · 1 comment
Labels
status: triage this issue has not been evaluated yet

Comments

@0xVires
Copy link

0xVires commented Jun 23, 2023

My Transcoder crashed with fatal error: unexpected signal during runtime execution for the second time within 48h. Both errors appeared during peak sessions (~35) on the same T (it's my highest traffic T). Hardware-wise, the node should be able to handle at least twice this amount of sessions.

The start of the error looks like this, followed by an insanely long error message:

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7fe7a9402be5]
runtime stack:
runtime.throw({0x579e1a, 0x0})
        /opt/hostedtoolcache/go/1.17.13/x64/src/runtime/panic.go:1198 +0x71
runtime.sigpanic()
        /opt/hostedtoolcache/go/1.17.13/x64/src/runtime/signal_unix.go:719 +0x389
goroutine 7493865 [syscall]:
runtime.cgocall(0x1eab370, 0xc0006877a8)

After that, the T restarts and continues working. Full error logs:
fatal_error_1.txt
fatal_error_2.txt

Running the latest Livepeer version (0.5.38) and a split O/T setup with 3 GPUs on the same machine.

@github-actions github-actions bot added the status: triage this issue has not been evaluated yet label Jun 23, 2023
@leszko
Copy link
Contributor

leszko commented Jun 23, 2023

I looked at the logs and I think it'll be very hard to trace it back to the root cause. Some ideas for testing/investigation:

  • Check if the mem was not exceeded
  • Try to reproduce it simulating the high load

Saying that, I think it's quite time-consuming to analyze it, so we may need to park if for now and see if it happens frequently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: triage this issue has not been evaluated yet
Projects
None yet
Development

No branches or pull requests

2 participants