Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to prioritize termination to gen_* behaviors #8371

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Maria-12648430
Copy link
Contributor

@Maria-12648430 Maria-12648430 commented Apr 11, 2024

This PR adds the ability to gen_server, gen_statem, gen_event and supervisor implementations to handle terminating events ('EXIT' from the parent when trapping exits, or manual shutdown via gen_*:stop/1,3) before any other messages in the message queue at that time. This can be enabled by a new start option {prioritize_termination, boolean()}(defaultfalse`).

(There are currently no tests or any documentation for this feature. I first wanted to see if there is any interest in it before investing more time.)

When enabled, the respective behavior will perform a selective receive for {'EXIT', Parent, _Reason} or {system, _From, {terminate, _Reason}} messages before the general receive for other messages which will be handled in order. If a parent exit or system terminate message is thus received, it will be handled accordingly by calling the implementations' terminate function and ignoring any other messages remaining in the message queue.

When disabled, all messages are handled strictly in order as in the current implementation, which means that parent exit or manual termination messages will only be handled after all other messages before them have been handled.

The price you pay for enabling this feature is that of a selective receive in each iteration of the receive loop, of course. As we all know, there is no free beer 😜


Use cases (at least mine) are services that perform long-running tasks on varying demand. The (message) queue of tasks given to such a service may become rather long at busy times. In the current implementation (or when prioritize_termination is disabled for that matter), shutting down such a service means that it will only do so when it has performed all the queued-up tasks. It is not unlikely that it reaches the shutdown timeout and will be killed by the parent supervisor, thereby preventing any possibly necessary cleanup. Picking a sensible shutdown timeout for such children is notoriously difficult, with infinity being the only safe option.

Another use case is the use of dynamic children which are slow to (re-)start in a simple_one_for_one supervisor. Here again, a long queue of start_child requests, and/or restarts if the children are transient, may build up. If the supervisor is to be shut down, it can only do so when it has performed all the starts and restarts that were enqueued before.


As a side note, adding this feature to supervisors unfortunately means that we have to add an additional different semantic to supervisor:start_link/3: Currently, there is start_link/2 (with arguments Module and Args) and start_link/3 (with arguments SupName, Module and Args). In order to be able to specify options, start_link/3 must additionally serve as "start_link/2 plus options".

Co-authored-by: Jan Uhlig <juhlig@hnc-agency.org>
Copy link
Contributor

github-actions bot commented Apr 11, 2024

CT Test Results

    2 files     94 suites   34m 59s ⏱️
2 043 tests 1 995 ✅ 48 💤 0 ❌
2 352 runs  2 302 ✅ 50 💤 0 ❌

Results for commit a597c76.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@max-au
Copy link
Contributor

max-au commented Apr 13, 2024

This looks like a very limited version of the prepend send - an ability to send a message into the head of the message queue instead of the tail. @rickard-green I recall you (or me? it's been a while) prototyping an implementation that relied on non-message signals to deliver such a message.

In my opinion, I'd rather have that, much more generic and useful instrument (although questionable from the message delivery guarantee point of view), than the implementation in this PR. To give a few reasons, I'm not sure in the performance implications added by prioritize_termination (selective receive itself is not always easy to reason about). Also, I think that listed use-cases can be implemented without adding a new flag:

  • killing long-running process can be done with an exit(Pid, kill)
  • same for the slowed down supervisor

@Maria-12648430
Copy link
Contributor Author

One of the points of this PR is specifically to not kill the process in the middle of whatever it is doing but to let it finish a current task (but only the current task), and perform its shutdown procedure.

Let's assume a gen_server implementation that traps exits, in order to be able to perform a shutdown procedure in terminate. Let's further assume that the messages this server receives are tasks, and that performing such a task takes ~1s. There is no telling at aqny point of time how many such tasks are waiting in the servers' message queue.

When the servers' supervisor tells it to shut down, the exit message will be added after all task messages in the message queue, and will be noticed and processed only after all of them have been processed. If there are 10 tasks, this will be in ~10s, if there are 100 tasks, this will be in ~100s. There is no way to determine a shutdown timeout which will allow the server to shut down orderly. At lazy times (few or no tasks) the shutdown you selected may suffice, at busy times (many tasks) it may not. This means there are only two possible timeouts which will result in a reliable consistent behavior: infinity (always process everything that is left, for the sake of the shutdown procedure to be performed, at the price that this may take a very long time) or brutal_kill (always stop instantly, at the price of leaving half-finished tasks and the shutdown procedure never being performed).

Put differently, you may have a pretty good estimate of how long a single task may take in the worst case, but this does not help you decide on a timeout since you have no idea how many of those tasks there are waiting.

This PR offers a way between the above extremes: by prioritizing the exit (or system terminate for that matter) messages, it allows the server to finish the task it is currently doing and then perform its own shutdown procedure. It is thus possible to set a good shutdown value by taking into account the ~processing time of a single task and the time the servers' shutdown procedure takes, plus some leeway.

That said, killing supervisors is IMO a dangerous thing to do, and to this end the docs urge the shutdown time for children which are supervisors to be always set to infinity.


The prepend_send feature you mentioned could be used for achieving this by sending exit messages to the head of the message queue of a process. But for one, it does not exist yet ;) For another, I would argue that this is way harder to reason about than a selective receive. For yet another, I don't think it would be a really good fit for the purpose this PR is intended for. Most of all, it leaves the concerned process out of the decision in what order to process things, the sender alone decides without the recipient having any say in it. That may have its benefits, but it makes design more compicated as the decisions that affect one process are made in another.

Also, it is not clear to me how prepending or not should be handled in the case of something like exits caused by links. That is, unless you want to make it so that exit signals/messages should always be prepended. This would be a breaking change.

@IngelaAndin IngelaAndin added the team:PS Assigned to OTP team PS label Apr 15, 2024
@rickard-green
Copy link
Contributor

rickard-green commented Apr 16, 2024

@Maria-12648430

I can see how this feature could be useful in scenarios like the one you describe.

@max-au

This looks like a very limited version of the prepend send - an ability to send a message into the head of the message queue instead of the tail. @rickard-green I recall you (or me? it's been a while) prototyping an implementation that relied on non-message signals to deliver such a message.

Yes. I don't think I've prototyped anything, though, but I'm not sure. Every now and then I stumble over situations where such a feature would have been helpful.

@Maria-12648430

Also, it is not clear to me how prepending or not should be handled in the case of something like exits caused by links.

It could be configurable. For example, prepending all 'EXIT' messages if the process has configured its message queue that way.

@Maria-12648430

I would argue that this is way harder to reason about than a selective receive

That depends. If a prepend to message queue feature is introduced, it needs to be documented in such a way that it is easy to understand what will be prepended and what will be appended.

@Maria-12648430

The (message) queue of tasks given to such a service may become rather long at busy times.

A problem with the selective receive approach is that every fetch of a message needs to scan the whole message queue which might become very expensive in scenarios like this. This is also, from my point of view, the biggest downside of selective receive as such. It is very easy to make the receive operation in certain scenarios very expensive.

By using a prepend send feature, it would more or less be as efficient as the code is today.

@michalmuskala
Copy link
Contributor

michalmuskala commented Apr 16, 2024

I think there's another way to solve the particular scenario at hand that should be considered.

In particular, the reason for the process to start trapping exits and delaying shutdowns is to execute cleanup with terminate. Another way to have the behaviour of "fast" termination would be to have some other way to have the process execute the terminate callback without turning all exit signals into exit messages that arrive in the common mailbox.

@IngelaAndin
Copy link
Contributor

In some ways I can see it useful but it I think it sounds like a way to try and handle lack of flow control, and I am unsure if
that is the way to go. I do not really like breaking the messages order guarantee. But if we should have something like that I think I prefer a new special exit like softkill or something that could be used as shutdown reason instead of kill and perhaps fall back on kill anyway if it still takes to long. Kill is sort of the last resort that you hope will never happen because it may cause problems but hopefully less problems than a hanging.

@max-au
Copy link
Contributor

max-au commented Apr 16, 2024

I would argue that this is way harder to reason about than a selective receive.

I find it easier to reason about. In fact, we already have this mechanism (internally; the name is "non-message signals"). They are always processed prior to processing any message signals. For example, process_info requests. Selective receive could be much harder to explain, especially presented with various optimisations (like "selective receive marker").

The mechanism for out-of-band delivery is indeed has been requested by almost any ERTS user. Some of these requests were only to mitigate design flaws (not in Erlang/OTP, but in software written by others - for example, various "process pools" that would want to leverage "prepend send" for out-of-band "worker process" status reports). But there are legit use-cases for priority processing, including the one you explained.

@rickard-green

Yes. I don't think I've prototyped anything, though, but I'm not sure.

I guess it was me, but I do remember discussing it with someone from the VM team. The latest iteration was sending a "non-message signal" that was converted to a message when moving from middle to inner queue. Hacky, and also clearing selective receive markers, but working.

I'm 100% sure that implementing any out-of-band mechanism will result in a (hacky) implementation of prepend_send. Hence, I'd rather do that in a documented way, than observe attempts to do so over terminate callbacks calling back to gen:enter_loop.

@juhlig
Copy link
Contributor

juhlig commented Apr 17, 2024

When you say prepend_send, you mean pushing at the head of the message queue? I don't think this is a very good idea 🤔

As a construed example just for illustration, imagine a supervisor with non-temporary children which are somewhat slow in starting. Imagine that, while it is starting a child, it receives an exit (eg from the parent supervisor) which gets pushed in at the head of the message queue. Then a child of the supervisor dies, and its exit also gets pushed in at the head and is now in front of the exit from the parent supervisor. Thus, the supervisor will restart the died child. And while it is doing that, more children die and their exits also get pushed in at the head, always before the parent exit. Etc etc.

Instead of pushing messages in at the head, it would probably a better idea to have a second "fast-lane" message queue in which exit messages get added in order of arrival.

@RaimoNiskanen
Copy link
Contributor

Since we have selective receives now, and since all the gen_* behaviours deliberately don't use selective receive to not fall into the trap of scanning a long message queue, I think this is a bad idea.

A server should never be designed to have a long message queue. Queueing can in many cases be done within the server instead.

That said, a model that for example makes some things like 'EXIT' signals pass by the message queue, or other kinds of out-of-band data can surely be worth investigating. Because sometimes message queues gets long despite your best efforts...

@Maria-12648430
Copy link
Contributor Author

Ok guys, you got me conviced 😜 Let's have that non-message-signals-bypassing-the-message-queue thing then (which is out of my league 😅).

@Maria-12648430
Copy link
Contributor Author

Oh, and I think @juhlig is right in saying that just prepending them to the message queue would be wrong.

@rickard-green
Copy link
Contributor

rickard-green commented Apr 17, 2024

@michalmuskala

I think there's another way to solve the particular scenario at hand that should be considered.

In particular, the reason for the process to start trapping exits and delaying shutdowns is to execute cleanup with terminate. Another way to have the behaviour of "fast" termination would be to have some other way to have the process execute the terminate callback without turning all exit signals into exit messages that arrive in the common mailbox.

Do you have a suggestion to how that mechanism would look like? Today you either terminate on the exit signal if trap exit is disabled or react on this information when receiving the EXIT message if trap exit is enabled. One way of doing this could be to prepend EXIT messages if "prepend exit messages" feature is enabled, but I guess you were thinking of something else.

@max-au

Yes. I don't think I've prototyped anything, though, but I'm not sure.

I guess it was me, but I do remember discussing it with someone from the VM team. The latest iteration was sending a "non-message signal" that was converted to a message when moving from middle to inner queue.

It was you and I who discussed it, but I don't think I prototyped any code for it. It would still preserve the signal delivery order guarantee, but would modify how messages are added to the message queue. That is, the order of messages in the message queue would not necessarily reflect the delivery order.

@juhlig

Instead of pushing messages in at the head, it would probably a better idea to have a second "fast-lane" message queue in which exit messages get added in order of arrival.

Yes I agree, pushing them to the head, but keep the pushed message in order of arrival is probably better. It makes the implementation a bit more complex, but should be doable without significant loss of performance.

@IngelaAndin IngelaAndin assigned bmk and unassigned bmk Apr 23, 2024
@IngelaAndin IngelaAndin added team:PO Assigned to OTP team PO and removed team:PS Assigned to OTP team PS labels Apr 23, 2024
@michalmuskala
Copy link
Contributor

michalmuskala commented Apr 24, 2024

Do you have a suggestion to how that mechanism would look like? Today you either terminate on the exit signal if trap exit is disabled or react on this information when receiving the EXIT message if trap exit is enabled. One way of doing this could be to prepend EXIT messages if "prepend exit messages" feature is enabled, but I guess you were thinking of something else.

This is just some very loose ideas - it would need some proper design. One option would be for a process to register some sort of termination handler that's always called when it receives an exit signal, regardless of if the process is trapping exits or not - sort of a "destructor".
The obvious problem is how to handle State so it's correctly passed around to such a callback, and I don't have a good answer there.
Another option would be to introduce some concrete API for handling out-of-band exit signals (and a corresponding process flag to turn that on), so that in the main gen_server loop we'd have something like:

erlang:handle_exit_signals(fun(ExitSignal) ->
    ...
end),
receive
    RegularMessage -> ...
end

And all gen_* behaviours could have this feature enabled automatically if the callback module defines a terminate callback.

@max-au
Copy link
Contributor

max-au commented Apr 27, 2024

That is, the order of messages in the message queue would not necessarily reflect the delivery order.

Yes. The only alternative I see is having multiple message queues (e.g. in GoLang they call is "Channels", and a process can receive from multiple channels). In that case the receiving process has control over receiving messages and may choose to process the "priority" message queue first, and then get to "normal" one.

@michalmuskala
Copy link
Contributor

michalmuskala commented Apr 28, 2024

I believe detaching "mailboxes" from processes and making them a firs-class object would be generally a beneficial idea - in particular it would make implementing a pool of workers trivial - multiple processes reading from one mailbox. There's likely other patterns that could emerge from such a change - though this seems like a major change to the system to implement such a feature.

That said, this is already somewhat of a thing with process aliases on the sender side - however, we lack the corresponding receiver side abstraction.

@lin72h
Copy link

lin72h commented Apr 29, 2024

I believe detaching "mailboxes" from processes and making them a firs-class object would be generally a beneficial idea

Sounds like Swift's Actor(which is only a container and hidden in order message queue) and actual Processing element(Task)

@essen
Copy link
Contributor

essen commented Apr 29, 2024

Historically modules like gen_server2 have had prioritisation implemented in this way:

  • gen_server2 periodically receives all messages and adds them to an internal queue in its state
    • it also receives when its internal queue is empty
  • whenever a message would normally be received, it takes them from the internal queue
    • it can apply priorities or other heuristics when doing so

The downside to that is that the normal functions for investigating processes no longer work and that messages are forced to stay on the heap (even if you use the off heap flag), amongst others.

I think there are two main cases regarding that:

  • Process needs to handle a specific message faster than others (to terminate quickly, or run a rarely requested task)
  • Process wants to prioritise a certain type of task

The second one can usually be refactored in some way that defers low priority stuff for a while, or be done over multiple processes. I don't think this one really needs solving or at least not at the level of the VM.

The first one doesn't really have a good answer today. An often seen case is calling sys:get_state or equivalent on a process that has a large backlog, or is not processing messages as fast as it normally should (but is not stuck). This mirrors early termination since it is in the realm of system messages. The "non-message signals" mentioned above seem to be the closest to that. But then implementation details are less clear to me.

Let's assume the process opts in for this OOB delivery. It has to receive all OOB messages out of order: meaning if we send both sys:get_state and exit messages OOB it has to handle them before normal messages. If one process wants to benefit from early sys:get_state only, but another both messages early, we have to have a separate flag for exit messages. Is there another type of message where this would be a problem? I assume application controlled messages don't need special handling.

@Maria-12648430
Copy link
Contributor Author

@essen hm, now that you say that... AFAIK, all system messages are really just normal messages, sent via gen:call. So, with an approach that only OOBs non-message signals like exits, system messages could not be prioritized over other "application controlled" messages.

@juhlig
Copy link
Contributor

juhlig commented May 22, 2024

This PR has fallen somewhat quiet, so I'll liven it up a bit 😅

Having read the comment of @essen and the follow-up by @Maria-12648430, I have to agree that just prioritizing non-message signals over message signals (optionally, of course) does not really cut it. In the context of OTP behaviors, system messages resulting from proc_lib/sys (which are just normal messages to the runtime) need to be prioritized as well.

But even if we forget about system messages for a moment and just concern ourselves with prioritizing non-message signals, this would IMO not be an optimal solution.
For example, a supervisor will likely want to prioritize exits from his parent supervisor higher than exits from his own children: With simple prioritizing non-message signals, just as it is now, if a burst of child exits occurs followed by an exit of the parent supervisor, it will first (try to) restart all the exited children (which may or may not take a while) only to shut them all down again (which also may or may not take a while) when it finally gets to processing the parent exit. Following up in this line, if the supervisor happens to be a named process, a restart of the failed parent supervisor will fail to start for as long as the named supervisor process is alive and busy doing meaningless child restarts and shutdowns.

So, in conclusion, instead of simply prioritizing non-message signals over message signals, what I think is needed is a general ability to prioritize some messages over others. For the supervisor as used in my example, it would be better to prioritize both parent exits (a non-message signal) and system (normal) messages over both child exits (non-message signals) and anything else (which_children etc).
Furthermore, it should be in the hands of the receiving process to decide which messages it deems important ("a system message or an exit from my parent is more important than exits from my children or whatever else"), not in the hands of the sending process ("my exit is important to my children!", "i'm sending a system message, this is important to the receiver!").

A possible (I'm not saying feasible) solution would be a dispatching or rating function, which decides upon message arrival how important it actually is. Like: parent exit or system message --> "whoa, very important, process ASAP"; child exit or whatever else --> "happens every day, process unless there is something more important (parent exit, system message, ...) to do".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:PO Assigned to OTP team PO
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants