-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQS Transport R-DUPE Warning Causes Messages To Never Be Made Visible #5159
Comments
Thanks for pointing this out, when using this type of fault handling, it's recommended to set a delay before redelivery. sqs.RethrowFaultedMessages();
sqs.ThrowOnSkippedMessages();
sqs.RedeliverVisibilityTimeout = 5; As you've done. The value can actually be lower, it works with 1 second, but nonetheless some sort of delay is recommended. That being said, it's worth investigating to ensure the lock context isn't being improperly renewed. |
namespace MassTransit.AmazonSqsTransport.Tests;
using System;
using System.Threading.Tasks;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;
using NUnit.Framework;
using Testing;
[TestFixture]
public class ReleaseLockContext_Specs
{
[Test]
public async Task Should_release_subsequent_lock_contexts()
{
var services = new ServiceCollection();
await using var provider = services
.AddMassTransitTestHarness(x =>
{
x.AddConsumer<TestLockContextConsumer>();
x.AddConfigureEndpointsCallback((context,name,cfg)=>
{
if(cfg is IAmazonSqsReceiveEndpointConfigurator sqs)
{
sqs.RethrowFaultedMessages();
sqs.ThrowOnSkippedMessages();
// sqs.RedeliverVisibilityTimeout = 1;
}
});
x.UsingAmazonSqs((context, cfg) =>
{
cfg.LocalstackHost();
cfg.ConfigureEndpoints(context);
});
})
.BuildServiceProvider(true);
var harness = provider.GetTestHarness();
await harness.Start();
try
{
await harness.Bus.Publish(new TestLockContextMessage() { Id = "567" });
Assert.That(await harness.Published.Any<TestLockContextRedeliveredMessage>());
}
finally
{
await harness.Stop();
}
}
}
public class TestLockContextConsumer :
IConsumer<TestLockContextMessage>
{
readonly ILogger<TestLockContextConsumer> _logger;
public TestLockContextConsumer(ILogger<TestLockContextConsumer> logger)
{
_logger = logger;
}
public async Task Consume(ConsumeContext<TestLockContextMessage> context)
{
await Task.Delay(TimeSpan.FromSeconds(1));
var redelivered = context.ReceiveContext.Redelivered ? "redelivered" : "";
_logger.LogInformation($"Got Message {context.Message.Id} {redelivered}");
if (context.ReceiveContext.Redelivered)
{
await context.Publish(new TestLockContextRedeliveredMessage() { Id = context.Message.Id });
return;
}
throw new Exception("This is intentional");
}
}
public record TestLockContextMessage
{
public string Id { get; init; }
}
public record TestLockContextRedeliveredMessage
{
public string Id { get; init; }
} |
This unit test will reproduce it, may have to run it a few times until the message is redelivered and marked as a duplicate. |
We originally tried RedeliverVisibilityTimeout = 1; but were still very occasionally seeing this issue in production workloads. Unfortunately when it happens to even a single message it causes the "Approximate Age Of Oldest Message" metric to continuously rise which eventually requires someone to look into the root cause. |
Yeah, the task is never canceled, so it will renew up to the 12 hour limit in theory. The trick is how to fix it without other breaking side effects. |
Contact Details
No response
Version
8.x
On which operating system(s) are you experiencing the issue?
Windows
Using which broker(s) did you encounter the issue?
Amazon SQS
What are the steps required to reproduce the issue?
What is the expected behavior?
We should see the message get redelivered by the queue and handled by the consumer multiple times because it always faults.
What actually happened?
An unexpected
R-DUPE
warning is emitted in the logs and the message only gets handled once. More importantly the message never becomes visible in the queue again and we can see the "Number Of Messages Not Visible" SQS queue metric stays at 1 and the "Approximate Age Of Oldest Message" SQS queue metric continuously grows. This persists until the program is terminated at which point the message becomes visible.Masstransit 8.0.13 and below have the expected behavior. I believe when this R-DUPE issue occurs the task which updates message visibility is never stopped. Seems likely related to this set of changes a3d66e5
Currently we are working around this issue by adding
endpointConfigurator.RedeliverVisibilityTimeout = 5;
which seems to avoid the R-DUPE issue at least most of the time.Related log output, including any exceptions
Link to repository that demonstrates/reproduces the issue
https://github.com/VoX/masstransit-r-dupe-issue
The text was updated successfully, but these errors were encountered: