Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication reference with node_confirms #1857

Open
martinsumner opened this issue Apr 19, 2023 · 1 comment
Open

Replication reference with node_confirms #1857

martinsumner opened this issue Apr 19, 2023 · 1 comment

Comments

@martinsumner
Copy link
Contributor

When replicating objects using nextgenrepl, the sink cluster will issue fetch requests from the source cluster. These fetch requests will read from the real-time queue any item ready for replication. the item will either be:

  • An actual object which has recently been PUT;
  • A reference to an object which exists in the store;
  • A delete reference.

The second case will commonly occur when a repl_keys_range aae_fold has been made (but also when the real-time queue has grown during a busy period).

In the case of an object reference being read from the queue, a standard GET request will be used to return the actual object to the sink.

That GET request will have some specific options to improve performance:

Options = [deletedvclock, {pr, 1}, {r, 1}, {notfound_ok, false}],

These objects allow the object to be returned to the client as soon as an object matching the expected vector clock has been returned. As this might happen after only 1 read, the R/PR settings over-ride any bucket property requiring a higher R value.

However, if the bucket has node_confirms set to a value more than 1 - the response will fail validation at this stage.

@martinsumner
Copy link
Contributor Author

It is possible to create unhandled pressure, especially when using repl range folds. The standard mechanism for tuning this is manipulating the number of snk_workers fetching and pushing.

An alternative would to allow the r value on fetch, and the w value on push to be over-ridden. This will slow replication, but by allowing for more than one vnode to confirm before completing the operation there will be a natural break on snk_workers when vnodes start developing backlogs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant