Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed change: Use a single config file for all purposes, all nodes #242

Open
KenBirman opened this issue Jul 9, 2022 · 0 comments
Open

Comments

@KenBirman
Copy link
Contributor

KenBirman commented Jul 9, 2022

I find it awkward that right now, Derecho (and hence also, Cascade) wants a different config file for every distinct node in the system. It seems to me that because config files have sections that can hold lists, we can easily move to a model in which the client config information is side by side with the server config information.

Then we can split our servers into two sets: those useful as contacts when seeking the leader, and those that are not in this list. The contact servers would be the only ones considered in our "who is the leader?" view protocol -- this would be a tiny check for Edward to add.

So now we have the case where three potential leaders try to start up simultaneously. This is not entirely trivial but turns out to be solvable in two ways. If they start on different nodes, there is an old self-stabilization ring protocol Danny Dolev once showed us; it seems to be classic for the self-stabilization community. So in effect, the K servers contend to be leader by forming a ring, and then the lowest rank (by node IP address) can be the initial leader.

In fact if there is a file system handy, we can do this even more easily: each node, on restarting, appends its IP address to a standard file in the global file system that lists nodes in Derecho. The first will need to create the file. The leader will end up being first in the file, the others are listed second, third, etc. Once the first view is published, Edward would also delete the file. Next time the system needs to restart, we go through this cycle again. [I guess we should also consider the case of a crash during restart... for that, we can add a rule that if the file is older than 1m ago, delete it and create a new version -- restart never takes more than a few seconds. And if we aren't trying to be fault-tolerant at initial boot, which is a complex question -- but do we want a system to risk thrashing because something is horribly wrong and initial boot isn't working? -- we can just say that whoever wins the race to do the file create gets to be the leader, no need to append IP addresses at all. So a file system is strong enough to disambiguate when multiple potential leaders turn up all at once. And when would we ever run in a setting with no shared file system available?]

If all are on one computer, we have an easier job: the first to grab the leader port number becomes the initial leader, and the others silently understand "port number in use" to imply "I'm not the initial leader, it must be one of the others, but no problem: localhost::portno will reach the leader". So they pause for second to let the leader get set up, then connect via TCP, and voila!

So there is a well understood protocol by which with one config file, we can cover every case that we currently cover with N distinct config files, one per server and potentially one per client.

The argument against using so many config files is simply that they are messy and awkward. Imagine a system running on thousands of server nodes. Do we really want to hand-craft and then maintain thousands of cfg files?

etremel added a commit that referenced this issue Sep 4, 2023
As a first step towards a lower-overhead config management system
(requested in #242), I changed the Conf initialization routine to expect
two files: One for the "group" config that must be the same on all
nodes, and one for a "node" config that contains only the options that
are unique per node. The "group" config is loaded first, then the "node"
config, so any options that appear in both files will take the values
specified in the "node" config file.

I also created a new top-level function, loadExtraFile, that allows a
client to repeat the config-file-loading process for an additional file
besides the two required ones. This should enable Cascade to use a
separate file for its options (i.e. "cascade.cfg") and ask Conf to load
and parse this file at startup.
etremel added a commit that referenced this issue Sep 8, 2023
As a first step towards a lower-overhead config management system
(requested in #242), I changed the Conf initialization routine to expect
two files: One for the "group" config that must be the same on all
nodes, and one for a "node" config that contains only the options that
are unique per node. The "group" config is loaded first, then the "node"
config, so any options that appear in both files will take the values
specified in the "node" config file.

I also created a new top-level function, loadExtraFile, that allows a
client to repeat the config-file-loading process for an additional file
besides the two required ones. This should enable Cascade to use a
separate file for its options (i.e. "cascade.cfg") and ask Conf to load
and parse this file at startup.
etremel added a commit that referenced this issue Sep 8, 2023
As a first step towards a lower-overhead config management system
(requested in #242), I changed the Conf initialization routine to expect
two files: One for the "group" config that must be the same on all
nodes, and one for a "node" config that contains only the options that
are unique per node. The "group" config is loaded first, then the "node"
config, so any options that appear in both files will take the values
specified in the "node" config file.

I also created a new top-level function, loadExtraFile, that allows a
client to repeat the config-file-loading process for an additional file
besides the two required ones. This should enable Cascade to use a
separate file for its options (i.e. "cascade.cfg") and ask Conf to load
and parse this file at startup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants