-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase service startup delay for tests that fail the health checks #1583
base: master
Are you sure you want to change the base?
Conversation
9843dc2
to
6d74231
Compare
c927d4f
to
95aec32
Compare
95aec32
to
9229d42
Compare
The health checks have a backoff of up to 30s in some cases, which should lapse before we perform any validations of the health check results.
9229d42
to
34c8af9
Compare
@@ -4505,6 +4510,10 @@ func TestNetworkHealthCheck(t *testing.T) { | |||
t.Fatal(err) | |||
} | |||
|
|||
// Use a longer service startup delay to allow the health check backoffs to complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous remote command startCommandForPlatform
won't continue until the Health Checks
finish executing. So, there is not need to account for the network check
backoffs within this tests.
ops-agent/cmd/ops_agent_windows/run_windows.go
Lines 73 to 82 in 11c19c2
s.log.Info(EngineEventID, "generated configuration files") | |
s.runHealthChecks() | |
changes <- svc.Status{State: svc.Running, Accepts: cmdsAccepted} | |
if err := s.startSubagents(); err != nil { | |
s.log.Error(EngineEventID, fmt.Sprintf("failed to start subagents: %v", err)) | |
// TODO: Ignore failures for partial startup? | |
} | |
s.log.Info(EngineEventID, "started subagents") | |
defer func() { |
AFAIU, the flakiness arises (as @jefferbrecht pointed offline) when querying for the service status within getRecentServiceOutputForPlatform()
. We could still add some delays, to be sure the status reaches the Windows Event Log
or journald
.
Description
The health checks can take up to 30s to time out when they fail. We use a service startup delay of 20s. which is adequate in passing scenarios. However, for tests that deliberately fail the health checks, we need to make sure that the full 30s can lapse before validating the health check results.
Related issue
b/321001728
How has this been tested?
Will let presubmits run.
Checklist: