Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV in akka-grpc clients > 2.1.1 running in Alpine Linux container seemingly caused by a bug in netty #1613

Open
alexklibisz opened this issue May 11, 2022 · 15 comments

Comments

@alexklibisz
Copy link
Contributor

alexklibisz commented May 11, 2022

Versions used

Any version of akka-grpc > 2.1.1.

To be clear, this doesn't seem to be a bug in akka-grpc itself, rather in a dependency of akka-grpc. See my comment here for more details.

Expected Behavior

Akka-grpc clients should work in an alpine linux container.

Actual Behavior

Akka-grpc client call crashes the JVM with a fatal error.

Relevant logs

When it crashes, the errors look something like this:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000000003efe, pid=8, tid=189
#
# JRE version: OpenJDK Runtime Environment (11.0.8+11) (build 11.0.8+11-alpine-r0)
# Java VM: OpenJDK 64-Bit Server VM (11.0.8+11-alpine-r0, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  0x0000000000003efe
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

Reproducible Test Case

I don't have this pulled out of my project yet. I might follow-up with one later.

@raboof
Copy link
Member

raboof commented May 11, 2022

As a short-term solution you could try setting akka.grpc.client."*".backend = "akka-http".

Do you have a reference to that netty bug?

@alexklibisz
Copy link
Contributor Author

alexklibisz commented May 11, 2022

To be clear, this does not seem to be a bug in akka-grpc directly, rather a bug in the grpc-netty-shaded dependency for versions > 1.41.0. I'm just noting it here since I discovered it while trying to upgrade to akka-grpc 2.1.2.

My understanding of the situation is this: there was some change in the version of netty used by grpc-netty-shaded 1.42.0 that causes the netty client to crash if it can't find some sort of native-level library or method in a library. I.e., the library/method is not in the JVM, but somewhere in the OS.

This is probably the most comprehensive issue covering it: grpc/grpc-java#9083

It branches out to several other issues.

The most relevant other issues seem to be: grpc/grpc-java#8751 and netty/netty#11701.

Towards the end of that grpc-java issue, one of the maintainers suggests that the bug was fixed in netty and we basically just have to wait for it to make its way through the grpc-java libraries. And then eventually into akka-grpc.

@raboof
Copy link
Member

raboof commented May 11, 2022

To be clear, this does not seem to be a bug in akka-grpc directly, rather a bug in the grpc-netty-shaded dependency for versions > 1.41.0. I'm just noting it here since I discovered it while trying to upgrade to akka-grpc 2.1.2.

Yeah, thanks, I think that is helpful.

Towards the end of that grpc-java issue, one of the maintainers suggests that the bug was fixed in netty and we basically just have to wait for it to make its way through the grpc-java libraries. And then eventually into akka-grpc.

I think netty maintains binary compatibility, so I wonder if you could 'pull up' the netty dependency to 4.1.77 in your project by adding it explicitly and see if that has the desired effect?

@alexklibisz
Copy link
Contributor Author

I think netty maintains binary compatibility, so I wonder if you could 'pull up' the netty dependency to 4.1.77 in your project by adding it explicitly and see if that has the desired effect?

AFAIK I would actually need a build of grpc-netty-shaded. For example, below is the dependency tree under akka-grpc. I don't see any direct reference to netty. It seems like it gets bundled up into grpc-netty-shaded. Maybe I'm misinterpreting though.

image

@alexklibisz
Copy link
Contributor Author

As a short-term solution you could try setting akka.grpc.client."*".backend = "akka-http".

I can confirm that this works on akka-grpc 2.1.2. Today is the first I've heard about the akka-http backend. Do you think it's production ready? Is there some discussion/documentation of the tradeoffs I could read up on. On the surface I prefer the idea of using akka-http on the client-side instead of netty, but I'm not aware of all the tradeoffs.

@alexklibisz alexklibisz changed the title SIGSEGV in akka-grpc clients > 2.1.1 running in Alpine Linux container caused by a bug in netty SIGSEGV in akka-grpc clients > 2.1.1 running in Alpine Linux container seemingly caused by a bug in netty May 11, 2022
@raboof
Copy link
Member

raboof commented May 11, 2022

It seems like it gets bundled up into grpc-netty-shaded. Maybe I'm misinterpreting though.

Aah, you're right.

I can confirm that this works on akka-grpc 2.1.2.

Great!

Today is the first I've heard about the akka-http backend. Do you think it's production ready? Is there some discussion/documentation of the tradeoffs I could read up on. On the surface I prefer the idea of using akka-http on the client-side instead of netty, but I'm not aware of all the tradeoffs.

It's less battle-tested compared to the netty implementation, and might behave a bit differently when using custom service discovery mechanisms or when reconnecting after failures. It might also unlock some additional metrics if you use it with Lightbend Telemetry. I agree it would be good to document this backend, and the corresponding trade-offs, but it's not so easy to enumerate them in a way that is helpful to readers :). I think it's production-ready, but make sure to evaluate it in your own staging environment as well, of course ;).

@alexklibisz
Copy link
Contributor Author

Thanks for the quick responses @raboof. I'll try out the akka-http backend in a lower-volume lower-consequence app. Otherwise will stick with akka-grpc 2.1.1 until the netty fix is released, makes it's way through grpc-java, and gets picked up here.

@alexklibisz
Copy link
Contributor Author

@raboof was this ever fixed?

@johanandren
Copy link
Member

A bit hard for me to see for sure if it was actually fixed upstream and if so in what version, but, akka-grpc 2.3.1 that was released 2 weeks ago transitively gets grpc-netty-shaded 1.53.0 pulled in (which in turn was released in february this year), so at least we are on a relatively new version of the shaded Netty.

@alexklibisz
Copy link
Contributor Author

Yeah the upstream libraries are pretty confusing :) I'll close, assuming it's fixed for now, and post back if it's not.

@alexklibisz
Copy link
Contributor Author

alexklibisz commented Jul 12, 2023

Re-opening as this is still an issue in akka-grpc 2.3.2. I'll try to provide more info. A couple relevant data-points:

  • Scala version does not seem to matter. We've seen this in both 2.12 and 2.13.
  • Changing the base image from Alpine to Ubuntu does seem to fix this issue.
  • We have gotten it working in Alpine in at least one case.

@alexklibisz alexklibisz reopened this Jul 12, 2023
@johanandren
Copy link
Member

johanandren commented Jul 13, 2023

I found this in the Netty docs which I had not noticed before (https://netty.io/wiki/native-transports.html#using-the-linux-native-transport):

The official Linux builds are all linked against GLIBC. This means operating systems that use Musl as their libc implementation are not supported by the official builds of the Netty native transports. ...

So even if a specific SIGSEGV is sorted it seems you cannot rely on Netty working consistently on Alpine since it is based on Musl and not glibc

@johanandren
Copy link
Member

It also mentions that you could build your own artifact linked against some other c stdlib, not sure if that could help.

@alexklibisz
Copy link
Contributor Author

Thaks @johanandren . Indeed I saw that statement of non-support for Musl yesterday, too. It seems new. I'm generally quite confused by how netty related to grpc-netty-shaded.

In any case, I think I have a new development to share:

I cloned the akka-grpc-quickstart-scala project and set it up to run the GreeterClient.scala App in an Alpine container w/ Jdk 11.

I found that, as written in the demo, the client instantiation blows up with a SIGSEGV, as described in the original description.

However, if I just add an SSLContext, the client instantiates successfully:

val sslContext = new SimpleSSLContextBuilder("TLSv1.2", Seq.empty, Seq.empty, None).build()
val settings = GrpcClientSettings.connectToServiceAt("server", 8080)
      .withSslContext(sslContext)
val client = GreeterServiceClient(settings)

I was able to reproduce this in some real applications as well.

I've no idea why this would be the case, but I guess it's new information. LMK if you have any thoughts.

So at a surface level, it seems like the client instantiation takes a different path if an SSLContext is provided, even if the SSLContext is basically empty, as shown above.

@alexklibisz
Copy link
Contributor Author

Just to summarize, the three workarounds seem to be:

  1. Instantiate the client with an SSLContext, like this comment: SIGSEGV in akka-grpc clients > 2.1.1 running in Alpine Linux container seemingly caused by a bug in netty #1613 (comment)
  2. Use the akka-http backend, like this comment: SIGSEGV in akka-grpc clients > 2.1.1 running in Alpine Linux container seemingly caused by a bug in netty #1613 (comment)
  3. Install gcompat and set the LD_PRELOAD environment variable, like this comment: JVM crash with grpc-java 1.42.x and alpine docker image grpc/grpc-java#8751 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants