Reject submissions with error if they cannot be written to MongoDB #837

jnm · 2022-07-20T17:55:03Z

MongoDB (and Pymongo) now enable retryable writes by default. From https://github.com/mongodb/specifications/blob/master/source/retryable-writes/retryable-writes.rst#why-are-write-operations-only-retried-once:

The spec concerns itself with retrying write operations that encounter a retryable error (i.e. no response due to network error or a response indicating that the node is no longer a primary). A retryable error may be classified as either a transient error (e.g. dropped connection, replica set failover) or persistent outage. In the case of a transient error, the driver will mark the server as "unknown" per the SDAM spec. A subsequent retry attempt will allow the driver to rediscover the primary within the designated server selection timeout period (30 seconds by default). If server selection times out during this retry attempt, we can reasonably assume that there is a persistent outage. In the case of a persistent outage, multiple retry attempts are fruitless and would waste time. See How To Write Resilient MongoDB Applications for additional discussion on this strategy.

Given that, let's not worry about adding an additional level of retries within synchronous request processing. Instead, we should return a failure response if a Pymongo reports that a submission has not been written successfully to MongoDB.

I believe that the philosophy behind the old design, which returns success once the submission is written to PostgreSQL—regardless of what happens with MongoDB—was to maximize data safety in the face of hostile field environments. A submission that stores successfully in Postgres but not in Mongo might never reappear to the server: the phone holding it could be run over by a truck before there's an opportunity to upload again. However, the risk of data being silently hidden from essentially all exports and views until someone runs ./manage.py sync_mongo --remongo likely outweighs the risk of the original submission being destroyed before it can be uploaded again after a transient MongoDB error.

Old description

This is necessary to support writing when replica sets are used (see #830).

AutoReconnect is:

Raised when a connection to the database is lost and an attempt to auto-reconnect will be made.

In order to auto-reconnect you must handle this exception, recognizing that the operation which caused it has not necessarily succeeded. Future operations will attempt to open a new connection to the database (and will continue to raise this exception until the first successful connection is made).

(NotPrimaryError is a subclass of AutoReconnect: https://pymongo.readthedocs.io/en/stable/api/pymongo/errors.html#pymongo.errors.NotPrimaryError.)

It's apparently the responsibility of our application code to retry queries whenever Pymongo raises these errors. Currently, we don't do that:

Pymongo loses its connection or finds itself connected to a server that doesn't accept writes;
A submission comes in, and KoBoCAT tries to write via Pymongo;
Pymongo raises exception;
KoBoCAT gives up on Mongo but returns success because the submission has been stored in PostgreSQL.

We need to change (4) so that KoBoCAT retries the Pymongo operation. I don't know what the best practice is, though. How many times should we retry? Should we delay between retries, and how much can we afford to delay given we are responding a synchronous HTTP request? Should we simply fail the request, rolling back the Postgres transaction and returning a failure code to the client?

The text was updated successfully, but these errors were encountered:

jnm added the backend label Jul 20, 2022

jnm changed the title ~~Tolerate AutoReconnect and subclasses from Pymongo~~ Reject submissions with error if they cannot be written to MongoDB Jul 20, 2022

jnm mentioned this issue Jul 21, 2022

Add support for Mongo Cluster #838

Open

bufke self-assigned this Aug 8, 2022

bufke mentioned this issue Aug 8, 2022

Reject submissions with error if they cannot be written to MongoDB #840

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reject submissions with error if they cannot be written to MongoDB #837

Reject submissions with error if they cannot be written to MongoDB #837

jnm commented Jul 20, 2022 •

edited

Reject submissions with error if they cannot be written to MongoDB #837

Reject submissions with error if they cannot be written to MongoDB #837

Comments

jnm commented Jul 20, 2022 • edited

Old description

jnm commented Jul 20, 2022 •

edited