Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending location awareness for general join/leave support #1001

Open
martinsumner opened this issue Mar 16, 2023 · 2 comments
Open

Extending location awareness for general join/leave support #1001

martinsumner opened this issue Mar 16, 2023 · 2 comments

Comments

@martinsumner
Copy link
Contributor

martinsumner commented Mar 16, 2023

Background information:

When joining a node, the following algorithms are attempted:

1 - A basic attempt to satisfy wants (vnodes required by the joining node) by asking node-by-node which vnodes can be passed on without breaking target_n_val (the claim_v2 algortihm).
2 - If Step 1 is unsuccessful, then attempt to stripe the all vnodes across all nodes (the sequential_claim algorithm).
3 - If Step 2 creates tail violations (i.e. if 0 < RingSize rem NodeCount < TargetNVal), resolve through the solve_tail_violations algorithm.

When leaving a node, the following algorithms are attempted:

1 - A basic attempt to perform a simple_transfer (vnodes are passed in turn to nodes that would not break target_n_val).
2 - Use sequential_claim as in join.
3 - Use solve_tail_violations extension to sequential_claim as in join

Ideally, in both cases Step 1 should succeed - as Step 2 will inevitable lead to a full cluster reorganisation (and hence a large volume of transfers).

As part of #967 location awareness was added to the sequential_claim algorithm (Step 2).

This issue is to document an ongoing investigation to these three problems:

  • Under what conditions does the sequential_claim algorithm (both with and without the need for the solve_tail_volationa algorithm provide a location safe cluster;
  • Can the claim_v2 (Step 1 for joins) and simple_transfer (Step 1 for leave) algorithms be extended to be location aware;
  • Can the claim_v2 and simple_transfer algorithms be extended to reduce the scenarios in which cluster changes fallback to sequential_claim.
@martinsumner
Copy link
Contributor Author

The initial condition to be tested is, will sequential_claim and solve_tail_violations consistently work if:

  • 32 <= RingSize <= 2048;
  • 6 <= count(Nodes) <= 64;
  • count(Nodes) > RingSize;
  • if max(count(NodesPerLocation)) = M, then there must be at least target_n_val locations where count(NodesPerLocation) is M.

To explain the last condition, if there are L locations, where L > TargetNVal, then at least TargetNVal locations must have M nodes, and the remaining (L - TargetNVal) locations must have =< M nodes.

@martinsumner
Copy link
Contributor Author

The hypothesis above is incorrect. Even with these pre-conditions there are still failures with sequential_claim to support target_n_val.

e.g. RS 128, or RS 256 with 10 nodes split evenly across 5 locations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant