-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement retries in vmexport download when server returns unexpected status #11911
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Alvaro Romero <alromero@redhat.com>
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test all |
982d428
to
a44cff0
Compare
/test all |
/retest |
/cc @awels |
@@ -475,6 +475,11 @@ func archiveHandler(mountPoint string) http.Handler { | |||
w.WriteHeader(http.StatusBadRequest) | |||
return | |||
} | |||
if hasPermissions := checkDirectoryPermissions(mountPoint); !hasPermissions { | |||
w.WriteHeader(http.StatusForbidden) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 5xx error codes are more appropriate in this case. 4xx errors indicate an issue with the client. See: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#4xx_client_errors
pkg/virtctl/vmexport/vmexport.go
Outdated
@@ -144,6 +144,8 @@ var ( | |||
manifestOutputFormat string | |||
) | |||
|
|||
const downloadRetries = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a --retry
arg similar to curl (retry x times on transient error) would be better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, would we want to retry every time the download fails or just when the server returns a bad status (as originally intended)? I prefer the second but since we'll include a flag to explicitly request retries maybe it makes sense to retry even when other kind of error occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it only makes sense to retry when the server returns bad status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still using 1 as default to mitigate transient server errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think default retries should be 0 what do you think @alicefr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can use 0 in vmexport if you prefer, but for the memory dump integration, I think we should at least use 1 to mitigate the bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Mike the default should be 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, left the higher default retries in the memory dump integration and justified it with a comment.
This commit implements a retry mechanism in the vmexport download command so, when the export server returns an unexpected status, we retry the download again. Signed-off-by: Alvaro Romero <alromero@redhat.com>
Signed-off-by: Alvaro Romero <alromero@redhat.com>
584e0f6
to
1b599f5
Compare
/retest |
1 similar comment
/retest |
return false | ||
} | ||
|
||
for _, item := range contents { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we cannot have subdirectories here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, with the current implementation we won't have any subdirectories here. This was discussed with the team and we decided there was no need to check permissions recursively.
This commit adds a flag that lets users specify the number of retrys the command will attempt before giving up. Signed-off-by: Alvaro Romero <alromero@redhat.com>
1b599f5
to
5e35169
Compare
/test pull-kubevirt-e2e-arm64 |
What this PR does
An issue with the hotplug pod termination/export pod creation caused the memory dump to lack sufficient permissions to be manipulated by the server. This uncommon bug is usually solved automatically when restarting the exporter pod. A simple retry mechanism in vmexport download should help to mitigate this bug.
This Pull Request introduces the two following changes:
Fixes # https://issues.redhat.com/browse/CNV-39141
Special notes for your reviewer
I tried handling this differently by exiting the server and then restarting the pod from the controller, but that led to raciness and incosnsitencies with the virtualmachineexport resource. The current solution seemed worst at first but doesn't require messing with the controllers/export resources and is overall cleaner and easier to implement.
Release note