-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster #3593
base: main
Are you sure you want to change the base?
Conversation
Uffizzi Ephemeral Environment
|
Hey 👋 We investigated this solution and wanted to look at what we can have and fix with a raft-based implementation.
The Raft library could fix it, as we are able to use whatever we want as the network connection protocol. In the basic example they are using an HTTP connection with reqwest.
If you are talking about the public and private IPs in a cluster, it should be possible to have two interfaces with Raft.
It should be ok with Raft. I hope. We must investigate that.
We must make sure we can express that a node is in the state. Returning that a node is non-healthy from the
It should be hard to trigger now that the index size is dynamic.
If you mean that the commit is only made on the index level, not the index-scheduler level, e.g., index swap, we can improve it later. If you mean that the API Keys are not supported, it is just a matter of time and effort.
It can be stored on disk in the future.
We must refuse direct API calls when not the leader.
OpenRaft is async and uses tokio. We could even maybe share the same tokio runtime as the one we use with Actix 😍
It should be ok with OpenRaft. That's a matter of Network and disk operations.
We must ensure that it is possible with OpenRaft to define the tasks we can process.
It should be possible, but we must look at what we can do about it.
We could look at the number of tasks to process to catch up with the number of tasks processed by the leader and change our healthiness based on this number. |
How does it work?
In this first implementation, we went on a leader/follower approach with a pre-selected leader that can't change.
The followers only follow the order of the leader but allow read.
And the leader is in charge of replicating all the writes to the followers and itself.
Processing a task
The leader will send the tasks to process to the follower.
Then, after indexing everything but right before committing the changes on disk, it'll wait for the state of the follower.
At the same time, the followers get the batch to process from the leader and also wait before committing.
Depending on the consistency rule, the leader might tell them to commit right away or later.
If the consistency has been set to;
one
: the leader will tell everyone to commit without waiting for any followers.two
: the leader will wait for one follower to be ready to commit before telling everyone to commit and moving onquorum
: the leader will wait until more than half of the cluster is ready to commitall
: the leader will wait until all the followers are ready to commit.Not implemented yet: If a follower doesn't get the same result as the leader, it should either:
Joining the cluster
When a node joins the cluster it won't be active straight away.
The leader will accept the connection with the follower, but it'll wait until the current task has been processed.
And in between two tasks, all the followers will « officially » join the cluster (we say they become active).
To share the leader's state with the new followers, it'll create a dump and send it to the followers so they can update themselves to the current state of the cluster.
The leaders and followers must share the same master key.
If that's not the case, the follower won't be able to join the cluster.
Also: the connections between the leader and followers are encrypted with chacha20 and the master key; thus, it's recommended to have a secure autogenerated master key of at least 32 bytes.
Synchronizing the API key
The leader forwards the API key operations to every follower, and it's updated ASAP without synchronizing anything.
What new API pieces have been introduced:
--experimental-enable-ha <EXPERIMENTAL_ENABLE_HA>
flag has been introduced. Its values are eitherleader
orfollower
.--leader <LEADER>
flag has been introduced. It lets you specify the address of the leader, and it's mandatory if you're a follower--consistency <CONSISTENCY>
flag has been introduced to configure the consistency rules. Its possible values are:one
=> The leader progress as fast as possibletwo
=> The leader + one node are in syncquorum
=> The majority of the cluster stays synchronizedall
=> The whole cluster stays in syncWhat is utterly broken/ugly currently and should be rewritten / handled correctly
keepalive
option enabled. Thus the connections are probably going to die often.Below are tamo's notes, don't try to understand anything.