Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It is impossible to shut down the server safely, concurrently and cleanly #1645

Open
OskiKervinen-MF opened this issue Aug 21, 2023 · 15 comments

Comments

@OskiKervinen-MF
Copy link

Hello,

There seems to be no way to call Server.stop() from another thread without the possibility that the server will never shut down. The root of the problem is that Server.stop() does nothing if the server is not running. Because of the nature of concurrency you can not easily guarantee the server has started if the server runs in a different thread.

Here is code to illustrate the problem. This program hangs forever:

#include "httplib.h"
#include <thread>
#include <iostream>

int main()
{
	std::cout << "Main: Start." << std::endl;

	httplib::Server svr1;
	svr1.Get( "/1", [ & ]( const httplib::Request& /*req*/, httplib::Response& res ) {
		res.set_content( "Hello World!", "text/plain" );
	} );

	int svr1_port = 0;
	auto thread1 = std::thread( [ & ]()
			{
				std::cout << "Server: Initializing..." << std::endl;

				// Simulate doing some slightly time consuming processing to prepare the server.
				std::this_thread::sleep_for( std::chrono::milliseconds( 100 ) );

				svr1_port = svr1.bind_to_any_port( "localhost" );

				std::cout << "Server: Starting the server..." << std::endl;
				svr1.listen_after_bind();
				std::cout << "Server: exiting." << std::endl;
			} );

	std::cout << "Main: Server thread started." << std::endl;

	// Simulate trying to start up and failing, taking some, but not much time.
	std::this_thread::sleep_for( std::chrono::milliseconds( 10 ) );

	svr1.stop();
	std::cout << "Main: Told server to stop." << std::endl;

	thread1.join();
	std::cout << "Main: Done" << std::endl;
}

The real use case is a program that actually used the server, but sometimes there is an error in start-up. The error will cause the program to shut down immediately, before the server is running, causing the stop to be ignored because of the race condition.

All work-arounds boil down to the main thread waiting in a loop until the server thread has initialized the Server. That works, I guess, but it's pretty ugly.

This is essentially the problem mentioned in #1371, but that was closed only with a few work-arounds mentioned. Also commenters in #43 seem to suffer from the same problem.

I think the correct solution would be for Server.stop() to set the server to a "shutting down" state, from which any listen_* calls would return immediately. This would automatically just work. The downside is that this would make a Server object usable only once. You couldn't call start and listen multiple times on the same object. So the change may not be backwards-compatible. Perhaps instead add a separate method named decommission that moves the server to a state whence it can never be started again.

@jimmy-park
Copy link
Contributor

Use Server::wait_until_ready before stopping the server. (#1548)

svr1.wait_until_ready();
svr1.stop();

OskiKervinen-MF added a commit to OskiKervinen-MF/cpp-httplib-fork that referenced this issue Aug 22, 2023
Fixes yhirose#1645.

`Server::stop` does nothing if the server is not running.
This can cause a program to hang it it calls stop before
the server processing thread has had time to initialize.
It would make sense for a server that has been stopped
to refuse to start, but that would be backwards incompatible.
Instead, we introduce a new method: `decommission`.
It sets the server object to stop and refuse to start.
This means it does not matter in which order
concurrent threads end up calling `listen` and `decommission`,
the result is always that the listening thread will exit.
@OskiKervinen-MF
Copy link
Author

Use Server::wait_until_ready before stopping the server. (#1548)

svr1.wait_until_ready();
svr1.stop();

That is handy. But it has two problems:

  1. It wastes time. Not much in practice, but still.
  2. It only works if it is inevitable that the server thread will start the server. If the server thread has any operations that could throw an exception, the server thread may never start the server, causing the main thread to hang.

Number 2 is the big issue here. Yes, this can be worked around by moving all initializations that could have errors to the main thread, and making the server thread only contain the call to listen but that is a pretty significant caveat for users to keep in mind and the refactoring required can be impractical.

Please consider my implementation for the proposed decommission in #1646.

@jimmy-park
Copy link
Contributor

How about this way?

Server svr;
std::thread t;

try {
    t = std::thread { [&] { svr.listen(HOST, PORT); } }
    svr.wait_until_ready();
} catch (...) {}

svr.stop();

if (t.joinable())
    t.join()

@OskiKervinen-MF
Copy link
Author

I was referring to a case where an exception occurs in the server thread. Like this:

#include "httplib.h"
#include <thread>
#include <iostream>

int main()
{
    std::cout << "Starting..." << std::endl;

    httplib::Server svr;
    std::thread t { [&] {
        try {
            throw std::runtime_error("Some thing that happens to go wrong.");
            svr.listen("localhost", 3000 );
        } catch (...) {
            std::cout << "[Thread] Exception. Aborting." << std::endl;
        }
    } };
    std::cout << "Before wait" << std::endl;
    svr.wait_until_ready();
    std::cout << "After wait" << std::endl;

    svr.stop();

    if (t.joinable())
        t.join();

    std::cout << "Reached the end successfully." << std::endl;
    return 0;
}

This program will hang forever.

@yhirose
Copy link
Owner

yhirose commented Aug 28, 2023

How about this?

int main() {
  std::cout << "Starting..." << std::endl;

  httplib::Server svr;
  auto server_started = std::atomic<bool>{false};

  std::thread t{[&] {
    try {
      throw std::runtime_error("Some thing that happens to go wrong.");
      server_started = true;
      svr.listen("localhost", 3000);
    } catch (...) { std::cout << "[Thread] Exception. Aborting." << std::endl; }
  }};

  if (server_started) {
    std::cout << "Before wait" << std::endl;
    svr.wait_until_ready();
    std::cout << "After wait" << std::endl;

    svr.stop();
  }

  if (t.joinable()) t.join();

  std::cout << "Reached the end successfully." << std::endl;
  return 0;
}

@OskiKervinen-MF
Copy link
Author

OskiKervinen-MF commented Aug 28, 2023

Nothing in your example stops the main thread from reading server_started before the server thread writes it, thereby never calling wait_until_ready() or stop(). So it can leave the server thread living forever in the cases where the exception does not happen.

That being said, you are correct that the problem can probably be solved by a sufficiently precise combination of at least two signal booleans. One for the server thread to signal its intent to inevitably start the server (making waiting for it safe), the other for the main thread to signal it already decided to shut down the server (allowing the server thread to not start when the call to stop is in the past). I'm not sure even this would work, I'd have to write the code and perform a thorough analysis.

But if you merge #1646, a simple call to svr.decommission() will always work.
No wait loops, no signal variables, no locking required.
I would argue that removing a need for subtle synchronization almost nobody will get right on the first try (and where failure appears as rare non-deterministic jams) is worth the addition of the new method.

@yhirose
Copy link
Owner

yhirose commented Aug 29, 2023

@OskiKervinen-MF thanks for the comment and identifying the bug in my example. I corrected it, and the following code should work either with throw or not.

int main() {
  std::cout << "Starting..." << std::endl;

  auto ok = std::atomic<bool>{true};

  httplib::Server svr;
  svr.Get("/hi", [&](const httplib::Request &, httplib::Response &res) {
    res.body = "hi...";
  });

  std::thread t{[&] {
    try {
      std::this_thread::sleep_for(std::chrono::milliseconds(100));
      // throw std::runtime_error("Some thing that happens to go wrong.");
      svr.listen("localhost", 3000);
    } catch (...) {
      std::cout << "[Thread] Exception. Aborting." << std::endl;
      ok = false;
    }
  }};

  std::cout << "Before wait" << std::endl;
  while (ok && !svr.is_running())
    ;
  std::cout << "After wait" << std::endl;

  if (ok) {
    std::cout << "Ready to make HTTP requests." << std::endl;
    httplib::Client cli("localhost", 3000);
    auto res = cli.Get("/hi");
    std::cout << res->body << std::endl;

    svr.stop();
  }

  if (t.joinable()) t.join();

  std::cout << "Reached the end successfully." << std::endl;
  return 0;
}

Regarding svr.decommission(), how can you apply the method in this example? Can it make the example any simpler?

@OskiKervinen-MF
Copy link
Author

Okay, I'm impressed. You got it working with a single signal boolean.
I didn't think that was possible. But I think it is worth noting that it took
two people analyzing the problem over several iterations to get there.
With decommission(), it simplifies to:

int main(int argc, char** argv) {
  std::cout << "Starting..." << std::endl;

  httplib::Server svr;
  svr.Get("/hi", [&](const httplib::Request &, httplib::Response &res) {
    res.body = "hi...";
  });

  std::thread t{[&] {
    try {
      std::this_thread::sleep_for(std::chrono::milliseconds(1000));
      std::cout << "Starting server." << std::endl;
      if( argc > 1 )
        throw std::runtime_error("Some thing that happens to go wrong.");
      svr.listen("localhost", 3000);
    } catch (...) {
      std::cout << "[Thread] Exception. Aborting." << std::endl;
    }
  }};

  std::cout << "Press enter to stop." << std::endl;
  std::cin.get();
  std::cout << "Stopping server" << std::endl;
  svr.decommission();
  t.join();

  std::cout << "Reached the end successfully." << std::endl;
  return 0;
}

For a server you start once and stop once, this gives you an entirely worry-free way to shut it down. After this discussion, I think I can solve my specific problem without the change, so it's no longer vital for us, but I still think it would be useful for users of the library. In the end, it's your call. Thanks for the discussion in any case.

@yhirose
Copy link
Owner

yhirose commented Aug 29, 2023

Thanks for the update. I'll close this issue.

By the way, I tested the code with your branch, but I didn't get "Reached the end successfully.". Also I still don't fully understand how decommission should be used, and hard me to explain it on README....

[topic/1645-decommission-server] ~/Projects/cpp-httplib-fork/example$ make repro && ./repro
make: `repro' is up to date.
Starting...
Press enter to stop.
Starting server.
^C

[topic/1645-decommission-server] ~/Projects/cpp-httplib-fork/example$ make repro && ./repro 1
make: `repro' is up to date.
Starting...
Press enter to stop.
Starting server.
[Thread] Exception. Aborting.
^C

@yhirose yhirose closed this as completed Aug 29, 2023
@OskiKervinen-MF
Copy link
Author

Did you press enter to stop? I was trying to simulate the arbitrary timing of the shutdown by letting it depend on user input.

How to use decommission? You use it like stop. If this library was just being published, stop() should do what decommission does. But with the current design there could be users that start and stop the same server object multiple times and that won't work with decommission. So fixing stop to be safe would be a breaking change. Therefore the safe stoppage needs its own method. In other words, you use decommission when you are done with a server object.

@yhirose yhirose reopened this Aug 30, 2023
@yhirose
Copy link
Owner

yhirose commented Aug 30, 2023

@OskiKervinen-MF that was my silly mistake... Yes, I confirmed that your pull request worked. I'll put my comment regarding the decommission method in #1646. Thanks!

@Nitecon
Copy link

Nitecon commented Dec 11, 2023

I'm running into a similar problem now and it does not seem like the server is currently honoring stop(). When I call stop() on the svr it just hangs until systemd finally kills that process. Is this in the latest version?

@yhirose
Copy link
Owner

yhirose commented Dec 11, 2023

@Nitecon at least I know, all the unit tests for client stop and server stop in the latest version are working file on GitHub Action CI.
https://github.com/yhirose/cpp-httplib/actions

@OskiKervinen-MF
Copy link
Author

@Nitecon You can make stop work by calling it with the very specific pattern yhirose shows in a comment above. They did not merge the convenient method to do this, because they considered it enough that's it's possible to work around the problem.

@yhirose Well, the the tests work because there is no test for this case. If you replace decommission with the current stop in the test #1646 tried to add, it will deadlock. That case is not supported without the work around presented in the comment I linked.

@Nitecon
Copy link

Nitecon commented Dec 12, 2023

The thing that tends to show up for me is this kills a single thread from a stop perspective but not all threads int he pool so it's not really a "stop" for the server which should effectively handle all of the child threads appropriately by killing them. I'm still testing to make sure it's not anything else on my side though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants