Proposed change: Use a single config file for all purposes, all nodes #242

KenBirman · 2022-07-09T12:37:21Z

I find it awkward that right now, Derecho (and hence also, Cascade) wants a different config file for every distinct node in the system. It seems to me that because config files have sections that can hold lists, we can easily move to a model in which the client config information is side by side with the server config information.

Then we can split our servers into two sets: those useful as contacts when seeking the leader, and those that are not in this list. The contact servers would be the only ones considered in our "who is the leader?" view protocol -- this would be a tiny check for Edward to add.

So now we have the case where three potential leaders try to start up simultaneously. This is not entirely trivial but turns out to be solvable in two ways. If they start on different nodes, there is an old self-stabilization ring protocol Danny Dolev once showed us; it seems to be classic for the self-stabilization community. So in effect, the K servers contend to be leader by forming a ring, and then the lowest rank (by node IP address) can be the initial leader.

In fact if there is a file system handy, we can do this even more easily: each node, on restarting, appends its IP address to a standard file in the global file system that lists nodes in Derecho. The first will need to create the file. The leader will end up being first in the file, the others are listed second, third, etc. Once the first view is published, Edward would also delete the file. Next time the system needs to restart, we go through this cycle again. [I guess we should also consider the case of a crash during restart... for that, we can add a rule that if the file is older than 1m ago, delete it and create a new version -- restart never takes more than a few seconds. And if we aren't trying to be fault-tolerant at initial boot, which is a complex question -- but do we want a system to risk thrashing because something is horribly wrong and initial boot isn't working? -- we can just say that whoever wins the race to do the file create gets to be the leader, no need to append IP addresses at all. So a file system is strong enough to disambiguate when multiple potential leaders turn up all at once. And when would we ever run in a setting with no shared file system available?]

If all are on one computer, we have an easier job: the first to grab the leader port number becomes the initial leader, and the others silently understand "port number in use" to imply "I'm not the initial leader, it must be one of the others, but no problem: localhost::portno will reach the leader". So they pause for second to let the leader get set up, then connect via TCP, and voila!

So there is a well understood protocol by which with one config file, we can cover every case that we currently cover with N distinct config files, one per server and potentially one per client.

The argument against using so many config files is simply that they are messy and awkward. Imagine a system running on thousands of server nodes. Do we really want to hand-craft and then maintain thousands of cfg files?

As a first step towards a lower-overhead config management system (requested in #242), I changed the Conf initialization routine to expect two files: One for the "group" config that must be the same on all nodes, and one for a "node" config that contains only the options that are unique per node. The "group" config is loaded first, then the "node" config, so any options that appear in both files will take the values specified in the "node" config file. I also created a new top-level function, loadExtraFile, that allows a client to repeat the config-file-loading process for an additional file besides the two required ones. This should enable Cascade to use a separate file for its options (i.e. "cascade.cfg") and ask Conf to load and parse this file at startup.

KenBirman added enhancement derecho new feature labels Jul 9, 2022

KenBirman assigned etremel, songweijia and KenBirman Jul 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed change: Use a single config file for all purposes, all nodes #242

Proposed change: Use a single config file for all purposes, all nodes #242

KenBirman commented Jul 9, 2022 •

edited

Proposed change: Use a single config file for all purposes, all nodes #242

Proposed change: Use a single config file for all purposes, all nodes #242

Comments

KenBirman commented Jul 9, 2022 • edited

KenBirman commented Jul 9, 2022 •

edited