Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resumable Uploads: Describe interaction with Digest Fields #2748

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Acconut
Copy link
Member

@Acconut Acconut commented Mar 4, 2024

This PR adds a section describing a possible interaction between resumable uploads and digest fields. I am not yet entirely sure whether this style is appropriate in such a draft at all. The intention is to help readers understand how digests could help in certain use cases and how implementation should handle interaction between resumable uploads and digests.

Closes #2408

@LPardue
Copy link
Contributor

LPardue commented Mar 10, 2024

The digest spec is very loose with any rules for what to do in failure and punts that up to other specs.

The text here seems technically correct but raises a few questions, mainly to do with reacting to failure. For example, if an append fails an integrity check on the server side, how does it signal back to the client and what are the expectations on the client to recover?

The PR currently opens more questions than it answers, which might lead to hard to troubleshoot interop problems. So a bit more thought and discussion is needed. For example, can we survey how s3, Google drive etc expect integrity fsilures to be handled?

@Acconut
Copy link
Member Author

Acconut commented Mar 16, 2024

For example, if an append fails an integrity check on the server side, how does it signal back to the client and what are the expectations on the client to recover?

I agree, these points should be mentioned. The draft could recommend that servers respond with a 400 status code. As far as I know, there is no status code specifically to indicate failure with integrity check and RFC 9530 does not make any recommendation (which might be on purpose, I assume). On the other hand, a 400 makes it challenging for the client to discover that the underlying cause was a failed integrity check.

I am unsure what the best reaction of the client would be if it discovers a failed integrity check. It could try to retransmit the upload part in the hope that the integrity failure was caused by a fault in the transmission (e.g. bit flip). Although it is questionable how often this occurs in practice. In most situation, the client would bubble up the error to the end as a integrity check failure often points to bugs in the implementation in my experience. Would you have any other recommendation?

For example, can we survey how s3, Google drive etc expect integrity fsilures to be handled?

I looked into S3 and the client can supply a checksum in the request headers or trailers (see https://docs.aws.amazon.com/AmazonS3/latest/userguide/checking-object-integrity.html). If the request content's checksum calculated by the server does match the provided value, it will reject the request with a 400 response code (see BadDigest at https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html)

@LPardue
Copy link
Contributor

LPardue commented Mar 16, 2024

For example, if an append fails an integrity check on the server side, how does it signal back to the client and what are the expectations on the client to recover?

I agree, these points should be mentioned. The draft could recommend that servers respond with a 400 status code. As far as I know, there is no status code specifically to indicate failure with integrity check and RFC 9530 does not make any recommendation (which might be on purpose, I assume). On the other hand, a 400 makes it challenging for the client to discover that the underlying cause was a failed integrity check.

We could use problem details in the response; see https://www.rfc-editor.org/rfc/rfc9457.html. This actually seems aligned with how S3 responds with BadDigest in your example, except it would be a standard approach that cite. If going that route, we might want to consider defining problem type extensions specifi to digest, such as returning the failed calculated digest value.

I am unsure what the best reaction of the client would be if it discovers a failed integrity check. It could try to retransmit the upload part in the hope that the integrity failure was caused by a fault in the transmission (e.g. bit flip). Although it is questionable how often this occurs in practice. In most situation, the client would bubble up the error to the end as a integrity check failure often points to bugs in the implementation in my experience. Would you have any other recommendation?

If the client is speaking "directly" to an upload server, I suspect a content-digest for an append has low risk of failing due to transport protocol integrity (e.g. TLS or QUIC). However, any cases of intermediation increase the number of processing steps and transport hops, and so the risks increase. Allowing an upload server sat behind N levels of intermediation to detect some form of corruption and signal that back seems of potential value to me.

Regarding bubbling up of errors: the upload append section currently states:

The client SHOULD NOT automatically retry if a client error status code between 400 and 499 (inclusive) is received.

It's an interesting thought if a user would want to be pestered with errors of dialog boxes to retry a failed upload append due to a digest failure, or whether their user agent could attempt like 3 retries automatically before giving up and flagging a problem.

If the client knows the integrity digest of the content from an upload creation ({{upload-creation}}) or upload appending ({{upload-appending}}) request, it MAY include the `Content-Digest` header field in the request. Once the content has been received, the server can compute the integrity digest of the received content and compare it to the provided digest. If the digests don't match the server SHOULD consider the transfer failed and not append the content to the upload resource. This way, the integrity of an individual request can be protected.

If the client does not know the integrity digest of the content from an upload creation ({{upload-creation}}) or upload appending ({{upload-appending}}) request, it MAY request the integrity digest from the server by including the `Want-Content-Digest` header field. While transferring the content, the client can compute the integrity digest of the transferred content and compare it to the digest provided by the server in the response. If the digests don't match the client SHOULD consider the transfer failed and cancel the upload ({{upload-cancellation}}).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there case for the situation where the server doesn't want to verify repr-digest itself but can calculate one and make it available for the client to obtain? For example, over a series of upload appends it combines ranges (content) into a representation. Then once upload is complete, the server can calculate the complete repr-digest and return that, allowing the client to perform a comparison of the local and remote views.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then once upload is complete, the server can calculate the complete repr-digest and return that, allowing the client to perform a comparison of the local and remote views.

Yes, I could image that very well. A client may calculate the checksum while uploading the data and thus cannot include it in the request header fields. In addition, comparing the complete repr-digest has the benefit that it verifies that the messages from the upload append requests were properly combined into a final representation, whereas comparing the repr-digest of a single upload append request only verifies the integrity of the message. If there is a bug on the server-side combining the upload appends, then this would be caught be the comparison of the complete repr-digest.

Overall, I could image that Repr-Digest is more helpful than Content-Digest for resumable uploads. If an upload append (or an upload creation) uses Content-Digest to secure the integrity of the message and the request transmission is interrupted in the middle, the server likely received just a partial message content and its digest won't match the digest of the entire message content that the client included in Content-Digest. A server may then reject the partial message due to the failure in validating the integrity, forcing the client to re-transmit the data.

The use of Repr-Digest helps in this case because the upload can interrupted and resumed multiple times while also accepting partial messages. Only once the upload is finished, the Repr-Digest of the entire representation is compared (either by the client or the server).

@Acconut
Copy link
Member Author

Acconut commented Mar 18, 2024

We could use problem details in the response; see https://www.rfc-editor.org/rfc/rfc9457.html. This actually seems aligned with how S3 responds with BadDigest in your example, except it would be a standard approach that cite. If going that route, we might want to consider defining problem type extensions specifi to digest, such as returning the failed calculated digest value.

Problem details would be a great fit here and be better suited than a status code to indicate mismatching digests. I am interested in exploring possible problem types for digest fields in general, which can also be used for use cases like resumable uploads.

I could also imagine that we define specific problem types for resumable uploads, for example to indicate details about mismatching offsets.

Regarding bubbling up of errors: the upload append section currently states:

The client SHOULD NOT automatically retry if a client error status code between 400 and 499 (inclusive) is received.

It's an interesting thought if a user would want to be pestered with errors of dialog boxes to retry a failed upload append due to a digest failure, or whether their user agent could attempt like 3 retries automatically before giving up and flagging a problem.

The recommendation to not retry on 4xx was mainly motivated by 404 or 403 responses. In these cases, retrying without modifying the request will likely not help because the upload resource doesn't exist at all or is not accessible by the end user. However, there are also some 4xx status codes where a retry of the same request could be beneficial, such as 408 Request Timeout, 423 Locked, 429 Too Many Requests. This was also an experience from a few tus implementations where we arrived at a situation where the client only retries on 423 and 429, while also allowing the end user to override this behavior. I think we should also update the text in this draft to include these considerations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Hashes of uploaded files
2 participants