New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
log,logstream: structured log emission & consumption mechanism #124058
Open
abarganier
wants to merge
4
commits into
cockroachdb:master
Choose a base branch
from
abarganier:structured-log-frmwk
base: master
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
The eventagg package is (currently) a proof of concept ("POC") that aims to provide an easy-to-use library that standardizes the way in which we aggregate Observability event data in CRDB. The goal is to eventually emit that data as "exhaust" from CRDB, which downstream systems can consume to build Observability features that do not rely on CRDB's own availability to aid in debugging & investigations. This commit contains the first bits of work to scaffold such a library. It focuses on creating the core building block of the eventagg package, the MapReduceAggregator (a common library used to perform map/reduce-like aggregations). It also provides an unused toy example usage, showing how MapReduceAggregator could be used to implement a SQL Stats-like feature. Since this feature is currently experimental, it's gated by the `COCKROACH_ENABLE_STRUCTURED_EVENTS` environment variable, which is disabled by default. Release note: none
This patch introduces the FlushTrigger interface, which can be used by the MapReduceAggregator to determine when it's time to flush the current aggregation. Along with the interface, an initial implementation is provided called `WindowedFlush`. `WindowedFlush` aligns event aggregations to truncated time intervals given a user-provided time window. For example, if a window of 5 minutes was given, the `WindowedFlush` would enforce the following window boundaries: - [12:00:00, 12:05:00) - [12:05:00, 12:10:00) - [12:10:00, 12:15:00) - etc. This is a first pass implementation of the flush mechanism used in the eventagg package. As needs evolve, the interface and/or implementation is subject to change. For the purposed of prototyping though, this meets our needs. Release note: none
This patch introduces log.Structured to the log package API. It aims to serve as a prototype for the log facility we will use for exporting "exhaust" from CRDB in the form of JSON objects. The intention is that this exhaust can be consumed externally and be sufficient enough to build features around. This iteration has some serious limitations, the main one being that it is not redaction-aware. The `log.StructuredEvent` exists alongside it for now. Both implementations are quite similar, so they should probably be reconciled and/or combined, but this is left as a TODO to avoid slowing down the prototyping process. For now, it's sufficient for prototyping. The patch also introduces a new logging channel explicitly for the new `log.Structured` API, called `STRUCTURED_EVENTS`. The ability to segment these types of logs from the rest of the logs is what motivates this separate channel. The name works for now, but we should consider if there's a better name available. A following patch will focus on internal consumption of these events. Release note: none
…ernally This patch expands upon the new structured logging facilities, adding a mechanism to consume emitted structured logs internally. This is done primarily via the new pkg/obs/logstream package, which handles buffering, routing, & processing of events logged via log.Structured. It can be used in conjunction with the eventagg package, and the KVProcessor interface, to provide users of the eventagg package a way to consume streams of events flushed from their aggregations. This enables engineers to use the aggregated data flushed from their aggregations to build features internal to CRDB. Features that are powered by the same data that could be consumed externally via the STRUCTURED_EVENTS log channel. The provided log config can be updated to make use of this new channel. For example: ``` sinks: file-groups: structured-events: channels: [STRUCTURED_EVENTS] ``` The changes aim to complete the eventagg pipeline/ecosystem, which now allows engineers to use common facilities to define aggregations, log the aggregated results, and consume the logged events internally as input data. Finally, it completes the toy StmtStats example by defining a processor for the aggregated events that are logged. Release note: none
abarganier
force-pushed
the
structured-log-frmwk
branch
from
May 16, 2024 18:50
2a31293
to
b4256ba
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note: please only consider the final two commits. The first two commits are being reviewed separately in #119416
This PR introduces
log.Structured
to the log package API. It aims toserve as a prototype for the log facility we will use for exporting
"exhaust" from CRDB in the form of JSON objects. The intention is that
this exhaust can be consumed externally and be sufficient enough to
build features around.
This iteration has some serious limitations, the main one being that it
is not redaction-aware. The
log.StructuredEvent
exists alongside itfor now. Both implementations are quite similar, so they should probably
be reconciled and/or combined, but this is left as a TODO to avoid
slowing down the prototyping process. For now, it's sufficient for
prototyping.
The patch also introduces a new logging channel explicitly for the new
log.Structured
API, calledSTRUCTURED_EVENTS
. The ability to segment thesetypes of logs from the rest of the logs is what motivates this separate
channel. The name works for now, but we should consider if there's a
better name available.
The PR also expands upon the new structured logging facilities, adding a
mechanism to consume emitted structured logs internally.
This is done primarily via the new pkg/obs/logstream package, which
handles buffering, routing, & processing of events logged via
log.Structured.
It can be used in conjunction with the eventagg package, and the
KVProcessor interface, to provide users of the eventagg package a way
to consume streams of events flushed from their aggregations. This
enables engineers to use the aggregated data flushed from their
aggregations to build features internal to CRDB. Features that are
powered by the same data that could be consumed externally via the
STRUCTURED_EVENTS log channel.
The provided log config can be updated to make use of this new channel.
For example:
The changes aim to complete the eventagg pipeline/ecosystem, which now
allows engineers to use common facilities to define aggregations, log
the aggregated results, and consume the logged events internally as
input data.
Finally, it completes the toy StmtStats example by defining a processor
for the aggregated events that are logged.
The below diagram outlines the end-to-end architecture of the system:
Release note: none
Epic: CRDB-35919