Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RiakClient shutdown() returns a future that never completes, JVM does not exit #706

Open
philip-doctor opened this issue Jul 25, 2017 · 4 comments

Comments

@philip-doctor
Copy link

philip-doctor commented Jul 25, 2017

RiakClient shutdown() returns a future that never completes, JVM does not exit

Context

I'm writing a simple ETL app, on conclusion my application is not exiting. I am calling shutdown() on my connection and then blocking .get(), but it never returns.

I have attached my debugger to it and stepped through the code, here's what I see happening

Expected Behavior

shutdown completes, program terminates

Actual Behavior

blocks forever on the shutdown future

Possible Fix

I think all you have to do is to catch the IllegalStateExceptions and continue iteration over each node in your list and call shutdown().

Steps to Reproduce

This does not reproduce cleanly, I suspect there's something racey, I'm using Kotlin, but I'm doing nothing fancy here:

    fun getRiak(riakPort: Int, riakHost: String): RiakClient =
        try {
            RiakClient.newClient(riakPort, riakHost)
        } catch (ex: UnknownHostException) {
            logger.error("Unknown Riak Host $riakHost $riakPort $ex")
            throw Exception("Unable to connect to Riak.")
        }
    val riakClient = getRiak(config.riakPort, config.riakHost)
    riakClient.shutdown().get() // This line hangs forever for me most of the time

This is what I think is happening, let's take a dive....
https://github.com/basho/riak-java-client/blob/riak-client-2.1.1/src/main/java/com/basho/riak/client/core/RiakCluster.java#L630

Here we get a list of nodes, what state are those nodes in?

https://github.com/basho/riak-java-client/blob/riak-client-2.1.1/src/main/java/com/basho/riak/client/core/RiakCluster.java#L408

stateCheck(State.CREATED, State.RUNNING, State.SHUTTING_DOWN, State.QUEUING);

Okay cool, so we get that list of nodes and then call shutdown on them over here:

https://github.com/basho/riak-java-client/blob/riak-client-2.1.1/src/main/java/com/basho/riak/client/core/RiakNode.java#L303

That does a state check
stateCheck(State.RUNNING, State.HEALTH_CHECKING);

So if the state is currently State.SHUTTING_DOWN then you get the node in your node list, then it throws an IllegalStateException. What happens then? Well the Cluster Shutdown action is taking place on a thread in the ExecutorService pool, it's never caught so it just kills the thread.

Because of this we never get our countdown latch to 0, because the thread that's supposed to shut everything down in the background got killed.

Because of that we block forever.

I think all you have to do is to catch the IllegalStateExceptions and continue iteration over each node in your list and call shutdown().

Context

I want my program to terminate normally and not leak connection

Your Environment

  • [2.1.1 ] Riak Java Client version
  • [ 1.8 ] Java version
  • [ 1.5.2 TS ] Riak version
  • [ Ubuntu 16.0.4-1 LTS] Operating System / Distribution & Version
@philip-doctor
Copy link
Author

I tried adding

                    try {
                        node.shutdown();
                    } catch (Exception ex) {
                        logger.debug("Node cannot be shutdown at this time, most likely already being shut down {}", ex);
                    }

In RiakCluster.java, it did not fix the problem, so something else is wrong and my analysis is flawed, but I'm still hanging on shutdown about half the time.

@philip-doctor
Copy link
Author

Based on a few more break points in my debugger, this is the line that hangs and never returns

https://github.com/philip-doctor/riak-java-client/blob/develop/src/main/java/com/basho/riak/client/core/RiakCluster.java#L443

It doesn't throw, it just never comes back.

@srgg
Copy link
Contributor

srgg commented Jul 25, 2017

@philip-doctor I'm not sure whether it is your case or not, but here is how we avoid 'stuck on exit' in spark connector https://github.com/basho/spark-riak-connector/blob/master/connector/src/main/scala/com/basho/riak/spark/rdd/connector/SessionCache.scala#L121

@philip-doctor
Copy link
Author

@srgg thanks for the tip, I tried it (kotlin code for reference)

    val builder = RiakNode.Builder()
            .withMinConnections(1)
            .withMaxConnections(1)

        builder.withRemoteAddress(config.riakHost)
        builder.withRemotePort(config.riakPort)
    val node = builder.build()


    val cluster = RiakCluster.builder(node)
            .withBootstrap(Bootstrap()
                    .group(NioEventLoopGroup(0, DefaultThreadFactory(NioEventLoopGroup::class.java, true)))
    .channel(NioSocketChannel::class.java))
    .withExecutor(ScheduledThreadPoolExecutor(2, ThreadFactory() { r ->

        val t: Thread = Executors.defaultThreadFactory().newThread(r)
        t.setDaemon(true)
        t
    }
    ))
    .build()
    cluster.start()

I seem to still hang, I think it's something with this line here going awry but I'm not sure what's exactly going wrong here yet. I'll see if I can take more time to dig.

bootstrap.config().group().shutdownGracefully();

dvimont pushed a commit to dvimont/bigDataBatchDemo that referenced this issue Sep 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants