Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log,logstream: structured log emission & consumption mechanism #124058

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

abarganier
Copy link
Member

Note: please only consider the final two commits. The first two commits are being reviewed separately in #119416


This PR introduces log.Structured to the log package API. It aims to
serve as a prototype for the log facility we will use for exporting
"exhaust" from CRDB in the form of JSON objects. The intention is that
this exhaust can be consumed externally and be sufficient enough to
build features around.

This iteration has some serious limitations, the main one being that it
is not redaction-aware. The log.StructuredEvent exists alongside it
for now. Both implementations are quite similar, so they should probably
be reconciled and/or combined, but this is left as a TODO to avoid
slowing down the prototyping process. For now, it's sufficient for
prototyping.

The patch also introduces a new logging channel explicitly for the new
log.Structured API, called STRUCTURED_EVENTS. The ability to segment these
types of logs from the rest of the logs is what motivates this separate
channel. The name works for now, but we should consider if there's a
better name available.

The PR also expands upon the new structured logging facilities, adding a
mechanism to consume emitted structured logs internally.

This is done primarily via the new pkg/obs/logstream package, which
handles buffering, routing, & processing of events logged via
log.Structured.

It can be used in conjunction with the eventagg package, and the
KVProcessor interface, to provide users of the eventagg package a way
to consume streams of events flushed from their aggregations. This
enables engineers to use the aggregated data flushed from their
aggregations to build features internal to CRDB. Features that are
powered by the same data that could be consumed externally via the
STRUCTURED_EVENTS log channel.

The provided log config can be updated to make use of this new channel.
For example:

sinks:
  file-groups:
    structured-events:
      channels: [STRUCTURED_EVENTS]

The changes aim to complete the eventagg pipeline/ecosystem, which now
allows engineers to use common facilities to define aggregations, log
the aggregated results, and consume the logged events internally as
input data.

Finally, it completes the toy StmtStats example by defining a processor
for the aggregated events that are logged.

The below diagram outlines the end-to-end architecture of the system:

Screenshot 2024-04-26 at 3 59 21 PM

Release note: none

Epic: CRDB-35919

@abarganier abarganier requested review from a team as code owners May 13, 2024 18:03
Copy link

blathers-crl bot commented May 13, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

The eventagg package is (currently) a proof of concept ("POC") that aims
to provide an easy-to-use library that standardizes the way in which we
aggregate Observability event data in CRDB. The goal is to eventually
emit that data as "exhaust" from CRDB, which downstream systems can consume
to build Observability features that do not rely on CRDB's own availability to
aid in debugging & investigations.

This commit contains the first bits of work to scaffold such a library.
It focuses on creating the core building block of the eventagg package,
the MapReduceAggregator (a common library used to perform map/reduce-like
aggregations).

It also provides an unused toy example usage, showing how
MapReduceAggregator could be used to implement a SQL Stats-like feature.

Since this feature is currently experimental, it's gated by the
`COCKROACH_ENABLE_STRUCTURED_EVENTS` environment variable, which is
disabled by default.

Release note: none
This patch introduces the FlushTrigger interface, which can be used by
the MapReduceAggregator to determine when it's time to flush the current
aggregation.

Along with the interface, an initial implementation is provided called
`WindowedFlush`. `WindowedFlush` aligns event aggregations to truncated
time intervals given a user-provided time window.

For example, if a window of 5 minutes was given, the `WindowedFlush`
would enforce the following window boundaries:

- [12:00:00, 12:05:00)
- [12:05:00, 12:10:00)
- [12:10:00, 12:15:00)
- etc.

This is a first pass implementation of the flush mechanism used in
the eventagg package. As needs evolve, the interface and/or
implementation is subject to change. For the purposed of prototyping
though, this meets our needs.

Release note: none
This patch introduces log.Structured to the log package API. It aims to
serve as a prototype for the log facility we will use for exporting
"exhaust" from CRDB in the form of JSON objects. The intention is that
this exhaust can be consumed externally and be sufficient enough to
build features around.

This iteration has some serious limitations, the main one being that it
is not redaction-aware. The `log.StructuredEvent` exists alongside it
for now. Both implementations are quite similar, so they should probably
be reconciled and/or combined, but this is left as a TODO to avoid
slowing down the prototyping process. For now, it's sufficient for
prototyping.

The patch also introduces a new logging channel explicitly for the new
`log.Structured` API, called `STRUCTURED_EVENTS`. The ability to segment these
types of logs from the rest of the logs is what motivates this separate
channel. The name works for now, but we should consider if there's a
better name available.

A following patch will focus on internal consumption of these events.

Release note: none
…ernally

This patch expands upon the new structured logging facilities, adding a
mechanism to consume emitted structured logs internally.

This is done primarily via the new pkg/obs/logstream package, which
handles buffering, routing, & processing of events logged via
log.Structured.

It can be used in conjunction with the eventagg package, and the
KVProcessor interface, to provide users of the eventagg package a way
to consume streams of events flushed from their aggregations. This
enables engineers to use the aggregated data flushed from their
aggregations to build features internal to CRDB. Features that are
powered by the same data that could be consumed externally via the
STRUCTURED_EVENTS log channel.

The provided log config can be updated to make use of this new channel.
For example:
```
sinks:
  file-groups:
    structured-events:
      channels: [STRUCTURED_EVENTS]
```

The changes aim to complete the eventagg pipeline/ecosystem, which now
allows engineers to use common facilities to define aggregations, log
the aggregated results, and consume the logged events internally as
input data.

Finally, it completes the toy StmtStats example by defining a processor
for the aggregated events that are logged.

Release note: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants