Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inhibit error messages on normal client disconnect #193

Open
KenBirman opened this issue Jan 21, 2021 · 0 comments
Open

Inhibit error messages on normal client disconnect #193

KenBirman opened this issue Jan 21, 2021 · 0 comments
Assignees

Comments

@KenBirman
Copy link
Contributor

A system such as Derecho or Cascade should normally be completely silent, printing error messages only in extreme situations. But during a demo of Cascade that Weijia ran yesterday, we saw dozens of error messages that Weijia had to keep explaining ("ignore that, it is just because the client disconnected").

I posted a separate issue related to the least graceful shutdown sequence in the universe, yesterday -- that one arises if you have a group of 3 or 5 nodes and they all just exit without shutting the system down first. A lot of alarming messages are printed, we try to form new views, etc.

But we also need to make "silent by default" the rule for external clients connected to Derecho, and for top-level clients that exit for some reason as well. These events cause both the TCP channels to break and also the RDMA connections to break, so the issues are sensed by a variety of logic -- some owned by Edward, some by Sagar, some by Weijia. All of this code should only print messages if some form of "verbose" compile-time constant is set to true, and otherwise should be totally silent (or you can perhaps put a message in a log, but absolutely not on the console). In fact, it should be viewed as bug if the system prints a message that did not absolutely need to be printed.

Example: "Garbled log, unable to restart" -- this would be a legitimate message to print. It relates to a genuinely unusual issue.
"Error -17 on connection to node 123.65.17.221" --- this is a "bad" message to print, except when debugging.

Please view this as something important for our V2.2 release, which presumably will be in the February/March/April timeframe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants