Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Deadlock in Concurrent Mode #150

Open
sklose opened this issue Jul 14, 2020 · 3 comments · Fixed by #160, #162 or #180
Open

Potential Deadlock in Concurrent Mode #150

sklose opened this issue Jul 14, 2020 · 3 comments · Fixed by #160, #162 or #180
Assignees
Labels
M-core This issue is related to the core module P2 Priority 2
Milestone

Comments

@sklose
Copy link
Collaborator

sklose commented Jul 14, 2020

It is currently possible for a Cookie Cutter application to deadlock when running in Concurrent or RPC mode:

  1. the input queue is full
  2. the message processor takes out one item, processes it and hits a sequence conflict
  3. the message processor optimistically processes more messages and fills up the output queue
  4. the open slots in the input queue are filled up again by one of the input sources
  5. the message processor tries to re-enqueue the failed message in the input queue, but is stuck because no new slot in the input queue is going to open up

Proposes solution to the problem:

Instead of applying the capacity to all messages in the queue we should apply it per priority level. As messages with sequence conflicts are re-enqueued with higher priority they will not get stuck because an input source has filled up the queue.

The only way to still get a deadlock would be if the output queue is larger than the input queue and the batch that needs to be re-queued exceeds the remaining capacity of the input queue on the high priority level. This will not happen out of the box though as by default both input and output queue have the same capacity. However it is something the end-user will need to be aware of when tweaking a service (we can reject that configuration in ApplicationBuilder and throw an Error to prevent this scenario)

This fix should be merge back to version 1.2

@sklose sklose added the M-core This issue is related to the core module label Jul 14, 2020
@sklose sklose modified the milestones: 1.3, 1.2 Jul 14, 2020
@k-gupta
Copy link
Contributor

k-gupta commented Jul 14, 2020

For the case where output queue size > input could you throw an error or warning so the user knows the risk?

@cross311
Copy link
Collaborator

I don't think this fixed it, or I have another issue. I am running into where I get a sequence conflict it retries all the ones and skips. then it just stops

@sklose
Copy link
Collaborator Author

sklose commented Aug 27, 2020

we have seen a few instances where the input queue capacity was limited to 100 but the queue size grew all the way to 300.

@sklose sklose modified the milestones: 1.3, 1.4 Oct 13, 2020
@plameniv plameniv added the P2 Priority 2 label Jul 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment