Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically configure CRDB replication #482

Open
BenjaminPelletier opened this issue Feb 26, 2021 · 0 comments
Open

Automatically configure CRDB replication #482

BenjaminPelletier opened this issue Feb 26, 2021 · 0 comments
Labels
dss Relating to one of the DSS implementations feature Issue would improve software P1 High priority

Comments

@BenjaminPelletier
Copy link
Member

CockroachDB stores data in ranges. Each range is a Raft consensus group composed of a certain number of replicas, each replica on a different node. The default replica count is 5 for system ranges (ranges that contain data about the CockroachDB database cluster) and 3 for data ranges. A range is functional if and only if a strict majority of its replicas are available ("quorum is met"). If a majority of any range's replicas permanently fail, that range is permanently lost unless a forensic recovery technique is applied to restore data from one of the minority replicas.

The default deployment configuration for a DSS pool is a minimum of 3 DSS instances, each DSS instance with 3 CRDB nodes (though this may change). We want to guarantee survival of the pool and all data in the pool when a minority of DSS instances are lost. The default configuration of 3 replicas that may be freely assigned to any node in the cluster does not achieve this objective as 2-3 of of those replicas may reside on the same DSS instance (thus causing a loss of quorum if that DSS instance goes down).

If we do not configure CRDB to ensure that every DSS instance receives a replica of every range, then we must increase the number of replicas for all ranges (system and data) to 7. Or, more generally, to survive the loss of a minority of DSS instances with N DSS instances, the number of replicas for all ranges must be set to 2 * 3 * floor(N / 2) + 1. Alternately, we could configure CRDB replication so that every DSS instance stores exactly one replica of each range.

@BenjaminPelletier BenjaminPelletier added P1 High priority feature Issue would improve software labels Feb 26, 2021
@BenjaminPelletier BenjaminPelletier added the dss Relating to one of the DSS implementations label Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dss Relating to one of the DSS implementations feature Issue would improve software P1 High priority
Projects
None yet
Development

No branches or pull requests

1 participant