Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: prefix all logs with worker tags #124027

Merged
merged 1 commit into from
May 14, 2024

Conversation

vidit-bhat
Copy link
Contributor

@vidit-bhat vidit-bhat commented May 13, 2024

Previously, many of the runner logs didn't have the worker tags. This was inadequate because we couldn't break up the logs by wX tags to see what each worker was doing.
This PR prefixes worker tags to the logs that were missing them.

Epic: none
Fixes: #114045
Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@vidit-bhat vidit-bhat marked this pull request as ready for review May 13, 2024 08:20
@vidit-bhat vidit-bhat requested a review from a team as a code owner May 13, 2024 08:20
@vidit-bhat vidit-bhat requested review from srosenberg, nameisbhaskar, DarrylWong and renatolabs and removed request for a team May 13, 2024 08:20
@renatolabs
Copy link
Collaborator

This LGTM, and I'm fine as-is, but now that I've taken a closer look, I'm starting to wonder whether we should change our approach.

For instance, looking at the test runner logs from a recent run [1], I see a couple of other entries that don't have the worker tag and would continue to not have that tag even after this PR:

06:06:00 cluster.go:1745: destroying cluster teamcity-15224239-1715492932-14-n9cpu4-geo [tag:] (9 nodes)...
06:06:09 roachprod.go:1421: Destroying cluster teamcity-15224239-1715492932-14-n9cpu4-geo with 9 nodes
[..]
teamcity-15224239-1715492932-17-n1cpu4: stopping and waiting
teamcity-15224239-1715492932-17-n1cpu4: wiping

The main difficulty is that the [w%d] annotation is added via calls to l.PrintfCtx. This means that for it to work properly, two things need to happen: every log call needs to use PrintfCtx instead of Printf; and we need to make sure that the context passed is the context with the worker tag. Both of these are hard to enforce, especially over time.

That said, I'm wondering if we should make this easier for ourselves by not relying on log tags to add the worker annotation. Specifically, we could use the ChildLogger function to create a child logger for each worker. So the final result would be something like

_runner-logs/
    test-runner/
        test_runner.log
        w1.log
        w2.log
        ...
        wN.log

With this approach, test_runner.log would continue to include the combined logs of every worker, while one is able to verify the output of a single worker by checking the corresponding wN.log file. This is similar to what the mixedversion framework does in order for each step to log to its own file [2].

[1] https://teamcity.cockroachdb.com/repository/download/Cockroach_Nightlies_RoachtestNightlyGceBazel/15224239:id/_runner-logs/test_runner-1715492932.log

[2]

// prefixedLogger returns a logger instance off of the given `l`
// parameter, and adds a prefix to everything logged by the retured
// logger.
func prefixedLogger(l *logger.Logger, prefix string) (*logger.Logger, error) {
fileName := strings.ReplaceAll(prefix, " ", "-")
formattedPrefix := fmt.Sprintf("[%s] ", fileName)
return l.ChildLogger(fileName, logger.LogPrefix(formattedPrefix))
}

@vidit-bhat
Copy link
Contributor Author

vidit-bhat commented May 13, 2024

Tested out the changes. The worker log files are being created parallel to the test_runner.log inside the _runner-logs directory.

Also, the entries you mentioned now do have a tag.

[w1] 16:40:14 cluster.go:1745: destroying cluster viditbhat-1715618211-02-n4cpu4 [tag:] (4 nodes)...
[w1] 16:40:19 roachprod.go:1421: Destroying cluster viditbhat-1715618211-02-n4cpu4 with 4 nodes

Copy link
Contributor

@DarrylWong DarrylWong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I played around with it locally and everything works for me. One tiny question but otherwise LGTM.

pkg/cmd/roachtest/cluster.go Show resolved Hide resolved
pkg/cmd/roachtest/test_runner.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@renatolabs renatolabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think we should not return on the worker and fallback to the parent logger as already mentioned, but otherwise LGTM.

pkg/cmd/roachtest/test_runner.go Outdated Show resolved Hide resolved
pkg/cmd/roachtest/test_runner.go Show resolved Hide resolved
@vidit-bhat vidit-bhat force-pushed the roachtest-tag-worker-logs branch 2 times, most recently from 63583c5 to cba5427 Compare May 14, 2024 05:20
Previously, many of the runner logs didn't have the worker tags.
This was inadequate because we couldn't break up the logs by wX
tags to see what each worker is doing.
This PR prefixes the worker tags to logs that were missing them.

Epic: none
Fixes: cockroachdb#114045
Release note: None
@vidit-bhat
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request May 14, 2024
123697: ccl,sql,util: Fix jwt auth and add sensitive error logs r=souravcrl a=souravcrl

ccl,sql,util: Fix jwt auth and add sensitive error logs

We are running into issues with jwt authentication and currently unable to
provide support as we are not logging the error from the http client used in the
authenticator. The PR looks to propagate this obtained error from
`ValidateJWTLogin` http client. The http client now also respects the system
http proxy if set.

Validated the error details when presenting an expired token
```
ERROR: JWT authentication: invalid token
SQLSTATE: 28000
DETAIL: unable to parse token: exp not satisfied Failed running "sql"
```

Validated error on setting wrong proxy params
```
ERROR: JWT authentication: unable to validate token
SQLSTATE: 28000
Failed running "sql"
```
and logged error:
```
I240510 08:31:28.604141 1473 4@util/log/event_log.go:32 ⋮
[T1,Vsystem,n1,client=127.0.0.1:56289,hostssl,user=‹sourav.sarangi›] 3
={"Timestamp":1715329888604122000,"EventType":"client_authentication_failed","InstanceID":1,"Network":"tcp","RemoteAddress":"‹127.0.0.1:56289›","SessionID":"17ce136f2a8ecd480000000000000001","Transport":"hostssl","User":"‹sourav.sarangi›","SystemIdentity":"‹sourav.sarangi›","Reason":"CREDENTIALS_INVALID","Detail":"JWT
authentication: unable to validate token\nunable to fetch jwks: Get
\"https://accounts.google.com/.well-known/openid-configuration\": proxyconnect
tcp: dial tcp [::1]:3129: connect: connection refused","Method":"jwt_token"}
```

Verified access logs after setting up squid proxy and passing env HTTP_PROXY and
HTTPS_PROXY params
```
1715103871.761    144 ::1 TCP_TUNNEL/200 5708 CONNECT accounts.google.com:443 -
HIER_DIRECT/74.125.200.84 -
1715103871.836     73 ::1 TCP_TUNNEL/200 5964 CONNECT www.googleapis.com:443 -
HIER_DIRECT/142.250.182.10 -
```

fixes #123575, CRDB-38386, CRDB-38408
Epic None

Release note: Noneccl,sql,util: Fix jwt auth and add sensitive error logs

We are running into issues with jwt authentication and currently unable to
provide support as we are not logging the error from the http client used in the
authenticator. The PR looks to propagate this obtained error from
`ValidateJWTLogin` http client. The http client now also respects the system
http proxy if set.

Validated the error details when presenting an expired token
```
ERROR: JWT authentication: invalid token
SQLSTATE: 28000
DETAIL: unable to parse token: exp not satisfied Failed running "sql"
```

Validated error on setting wrong proxy params
```
ERROR: JWT authentication: unable to validate token
SQLSTATE: 28000
Failed running "sql"
```
and logged error:
```
I240510 08:31:28.604141 1473 4@util/log/event_log.go:32 ⋮
[T1,Vsystem,n1,client=127.0.0.1:56289,hostssl,user=‹sourav.sarangi›] 3
={"Timestamp":1715329888604122000,"EventType":"client_authentication_failed","InstanceID":1,"Network":"tcp","RemoteAddress":"‹127.0.0.1:56289›","SessionID":"17ce136f2a8ecd480000000000000001","Transport":"hostssl","User":"‹sourav.sarangi›","SystemIdentity":"‹sourav.sarangi›","Reason":"CREDENTIALS_INVALID","Detail":"JWT
authentication: unable to validate token\nunable to fetch jwks: Get
\"https://accounts.google.com/.well-known/openid-configuration\": proxyconnect
tcp: dial tcp [::1]:3129: connect: connection refused","Method":"jwt_token"}
```

Verified access logs after setting up squid proxy and passing env HTTP_PROXY and
HTTPS_PROXY params
```
1715103871.761    144 ::1 TCP_TUNNEL/200 5708 CONNECT accounts.google.com:443 -
HIER_DIRECT/74.125.200.84 -
1715103871.836     73 ::1 TCP_TUNNEL/200 5964 CONNECT www.googleapis.com:443 -
HIER_DIRECT/142.250.182.10 -
```

fixes #123575, CRDB-38386, CRDB-38408
Epic None

Release note: None

124027: roachtest: prefix all logs with worker tags r=vidit-bhat a=vidit-bhat

Previously, many of the runner logs didn't have the worker tags. This was inadequate because we couldn't break up the logs by `wX` tags to see what each worker was doing.
This PR prefixes worker tags to the logs that were missing them.

Epic: none
Fixes: #114045
Release note: None

Co-authored-by: Sourav Sarangi <sourav.sarangi@cockroachlabs.com>
Co-authored-by: Vidit Bhat <vidit.bhat@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented May 14, 2024

This PR was included in a batch that was canceled, it will be automatically retried

@craig craig bot merged commit 1e9eabb into cockroachdb:master May 14, 2024
22 checks passed
@vidit-bhat
Copy link
Contributor Author

blathers backport 24.1 23.2

Copy link

blathers-crl bot commented May 14, 2024

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 0a2f34e to blathers/backport-release-24.1-124027: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 24.1 failed. See errors above.


error creating merge commit from 0a2f34e to blathers/backport-release-23.2-124027: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.2 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: prefix all logs with worker tags
5 participants