Enhancement to Retry Mechanism for Batch Processing #1651
Labels
dead letter queue
Things related to the DLQ feature
feature
New features related stuff
pro
Things related to Karafka PRO components
In the current implementation, the
#retrying?
method is used to check if a batch is under a retry scenario. The user is looking to forward only problematic messages from a batch to the Dead Letter Queue (DLQ) after number of retries while processing the valid ones accordingly. There is a need to identify the current retry count in Karafka's consumer context.Existing Solution:
The current retry count can be obtained using
coordinator.pause_tracker.attempt
. The first attempt is not considered a failure, but each retry will bump this counter.Problem:
Karafka currently considers the first message from a batch as the broken one if marking as consumed after processing the entire batch. Hence, it does not support identifying problematic messages in between a batch.
Proposed Enhancement:
Allow users to mark specific messages for retry or mark messages as consumed in a virtual manner. Karafka should be able to retry only those marked messages before moving them to DLQ and should maintain warranties as if it was in VPs.
Additional Notes:
The feature is partially implemented for virtual partitions, where it allows for unordered processing and errors collapse and skip consumed messages. Bringing this behavior to the core is feasible but might require more considerations regarding ordering and partitioning.
Status:
To be considered for future implementation. No ETA provided yet.
Something like that. I don't think that the name is good though because it resembles too much the VPs virtual offset management.
Conceptually it should operate in a similar fashion like VPs with one message per VP and final state with filtering.
The text was updated successfully, but these errors were encountered: