Allow for more flexible control of connection backlog and number of connections accepted "at once" #3062

ghost · 2021-09-08T15:49:47Z

This PR proposes a form for increased flexibility in a) controlling the backlog size for the listening connection(s) a Tornado application ends up using, and b) controlling how many connections are accepted during one iteration of Tornado's I/O loop, something that currently is inherently tied to the default backlog size (hence both changes with a single commit).

Tornado applications can admittedly already effectively control the backlog size applied to their listening socket(s). They may do so through ~~a) importing and altering the value of the _DEFAULT_BACKLOG module variable from netutil~~ (can't do so because default function arguments are assigned during function creation, not invocation), or b) resorting to initializing their listening sockets differently than what the example in the user guide illustrates, e.g. invoking bind oneself.

The second option has two drawbacks. First, it necessitates changing the "simple" application initialization code to a more involved one just to be able to specify own backlog. Second, by changing the code thus, the application will have to resolve and bind each socket itself -- something Tornado otherwise would have done for it "for free" with its very useful bind_sockets. Of course bind_sockets is also part of the API, and may be called as well, but that turns the code into an even more involved variant for initialization.

The first option does not have the aforementioned drawbacks, of course -- the application can be initialized as per the "simple" example in the user guide, having set the imported _DEFAULT_BACKLOG to a value application prefers by default.

More importantly, however, both options make it impossible to have Tornado not specify any backlog argument to listen at all. These proposed changes partially aim at alleviating that limitation. Thing is, a future version of Python is free to change how the "default reasonable value" is chosen for listen, and Tornado, being a framework of increasing adoption, in an increasingly varied number of scenarios, may do well by allowing applications to defer to said default behaviour by Python, instead of insisting on a "magic" number of its own.

So this PR proposes to first and foremost change the default backlog size to None, allowing Tornado to be able to correspondingly omit providing backlog argument to listen and thus defer to a backlog as decided by Python. Of course, specifying an integer value means the value will be passed to listen, as before.

Additionally, due to the default backlog size now potentially being None, the number of connections attempted accepted at one time can no longer simply follow the default backlog size as before. We use this opportunity to add an additional module variable which may be used to control the number [of connections to try and accept], deferring (if None) to default backlog size if the latter is non-zero, or ultimately, 128 (a final fallback for lack of a better magic value).

There is a prior conversation touching on this topic.

This commit is about three following changes: 1. Control of default backlog throughout the lifetime of a Tornado application, through the now exported (no `_` prefix) module variable. Previously, by way of how Python initializes function argument defaults, the default was used once during module initialization (when procedures are created) which "froze" the default value in the sense there was no way to affect the default during application lifetime -- typically as configuration is loaded, for example. To allow application to configure the default backlog when it has the preferred value available, evaluation of the default backlog immediately before calling `listen`, is thus necessary. 2. Deferring to Python deciding on the backlog, through using `None` as valid value. For Python 3.6, for example, that would imply the minimum between 128 and SOMAXCONN: https://github.com/python/cpython/blob/3.6/Modules/socketmodule.c#L3026 3. Because the number of connections Tornado attempted to `accept` with each iteration of it's I/O loop, was tied to the default backlog value, because this commit allows for `None` or zero (valid for `listen`) values for the latter, the heuristics needed to be amended because neither zero nor `None` would be an acceptable number of connections to accept. This problem is solved by appropriately adding a separate variable that controls the number, with fallback through a non-zero default backlog to 128.

bdarnell · 2022-01-30T21:12:17Z

tornado/netutil.py

+DEFAULT_BACKLOG = None
+
+# Number of connections to try and accept in one sweep of an I/O loop;
+# if falsy (e.g. `None` or zero) the number will be decided on


These comments imply that monkey-patching these constants is intended to be a public interface for configuration. I don't think that's a good idea (or at least I'd want to see some more justification for why someone might want to change this so we could think about the most appropriate interface).

I'd rather just hard-code the number of iterations in accept_handler, and get rid of the tornado-side _DEFAULT_BACKLOG (so we call listen() with an argument if a backlog is given, and otherwise use no arguments so that the default is determined by lower levels).

I am currently debugging a case with our usage of tornado where we do want to change this to a much smaller value.

What we have seen is that in multiprocessing pre-fork setups, where we have a large number of workers sharing 1 socket, we see very poor tail latencies under high load. We have narrowed down due to huge amounts of queuing within processing that accidentally accept far too high of a share of the incoming connections. IE while we might have on average 3 async requests in flight per worker, when we see the outliers with high tail latencies hitting timeouts and inspect their io_loops we'll see that 30 requests connections got all accepted at once on that worker, and that 30th request queue up at the end in the worker will have massively increased latency.

We are currently looking at ways to prevent this poor load balancing, and at least one element of it, might be to limit the number of connection accepted at a time, so that if a small queue under high load of pending connections has built up, it's not entirely taken by a single worker, and is better distributed in smaller chunks over multiple workers.

Interesting. That does sound like a good reason to make the accept-calls-per-loop value configurable (or maybe just to hard-code it to a small value, but I suspect we'd have a hard time agreeing on what that value should be), so the accept-calls-per-loop variable is valuable even aside from its use here for the case when the accept backlog is unspecified/None.

I still don't like monkey-patching module-level constants as the configuration interface, but it seems like whatever interface we come up with should be used for both variables.

In the meantime, have you tried SO_REUSEPORT (and binding the socket after the fork)? My understanding is that the kernel treats this differently from the case of a socket opened before a fork, in a way that generally does a better job at balancing the load across the processes. I don't have any experience with it at scale, though.

bdarnell reviewed Jan 30, 2022

View reviewed changes

bdarnell added the netutil label Feb 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow for more flexible control of connection backlog and number of connections accepted "at once" #3062

Allow for more flexible control of connection backlog and number of connections accepted "at once" #3062

ghost commented Sep 8, 2021 •

edited by ghost

bdarnell Jan 30, 2022

AlexBoyd Apr 13, 2023

bdarnell Apr 14, 2023

Allow for more flexible control of connection backlog and number of connections accepted "at once" #3062

Are you sure you want to change the base?

Allow for more flexible control of connection backlog and number of connections accepted "at once" #3062

Conversation

ghost commented Sep 8, 2021 • edited by ghost

bdarnell Jan 30, 2022

Choose a reason for hiding this comment

AlexBoyd Apr 13, 2023

Choose a reason for hiding this comment

bdarnell Apr 14, 2023

Choose a reason for hiding this comment

ghost commented Sep 8, 2021 •

edited by ghost