Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit shard snapshot work-in-progress #108739

Open
DaveCTurner opened this issue May 16, 2024 · 1 comment
Open

Limit shard snapshot work-in-progress #108739

DaveCTurner opened this issue May 16, 2024 · 1 comment
Labels
>bug :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team

Comments

@DaveCTurner
Copy link
Contributor

The pause-on-shutdown mechanism introduced in #101717 in fact resets every affected shard-level snapshot such that they retry again from the beginning on their new node(s), discarding the work-in-progress uploads of each shard. Today we make no attempt to limit the amount of work-in-progress that might be discarded on a node shutdown, since we interleave the uploads of the files from every shard that is being snapshotted. This can mean that the discarded work can be very substantial (in one case we observed it to set the overall snapshot progress back by over 10TiB).

We should find some way to limit this WIP, focussing on completing individual shard snapshots sooner, to reduce the effects of a shutdown mid-snapshot.

@DaveCTurner DaveCTurner added >bug :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels May 16, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label May 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team
Projects
None yet
Development

No branches or pull requests

2 participants