Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RulesGroup doesn't follow Prometheus spec #932

Open
optix2000 opened this issue Apr 11, 2024 · 6 comments
Open

RulesGroup doesn't follow Prometheus spec #932

optix2000 opened this issue Apr 11, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@optix2000
Copy link

According to the docs:

RuleGroup declares rules in the Prometheus format: https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/

However in Prometheus rule_group's, interval is optional. For the RuleGroup, interval is required, which is different from what's expected.

ClusterRules.monitoring.googleapis.com "foo" is invalid: [spec.groups[0].interval: Required value]
@optix2000
Copy link
Author

This means we cannot copy rules 1:1 into the ClusterRules CRD.

@bwplotka
Copy link
Collaborator

bwplotka commented Apr 12, 2024

Hi, thanks for reporting!

This is intentional as there is no global interval setting for rule evaluations (aka global.evaluation_interval), because that option only exists in Prometheus. In GMP it's a separate component that evaluates rules.

This means we cannot copy rules 1:1 into the ClusterRules CRD.

That's true, this means you have to manually/via sed/script add interval explicitly.

Just to understand the consequences of this decision -- this is one-off annoyance though, right? Or is it causing more problems for you?

@optix2000
Copy link
Author

This is mostly a one-off annoyance, but causes confusion as we cannot point people directly to the Prometheus docs.

Given all the components are managed by GCP, it feels like it should just use the value of global.evaluation_interval configured in alertmanager, or the default of 1m.

@bwplotka
Copy link
Collaborator

Got it, thanks.

I assume you can point people directly to the Prometheus docs, but you have to mention the nuance to specify that one field explicitly, right?

The point is that evaluation interval is also useful to specify explicit. It matters, depending on the rule. Do you think a global ruleEvaluator flag (e.g. in OperatorConfig) would be good enough here?

global.evaluation_interval configured in alertmanager, or the default of 1m.

There is no such setting in AM, probably you mean Prometheus - which is not relevant in our stack as we don't use Prometheus for rule evals?

Default to 1m is an option, yes, maybe not too bad.

@optix2000
Copy link
Author

I assume you can point people directly to the Prometheus docs, but you have to mention the nuance to specify that one field explicitly, right?

That's exactly what we do, but not everyone reads the nuance, especially if they have used prometheus/alertmanager before.

There is no such setting in AM, probably you mean Prometheus - which is not relevant in our stack as we don't use Prometheus for rule evals?

Ah got it.

Default to 1m is an option, yes, maybe not too bad.

This would be my preference, as it's the Prom default

@bwplotka
Copy link
Collaborator

bwplotka commented Apr 25, 2024

Thanks. Moving this to prioritization queue (OSS help wanted too).

Acceptance Criteria

@bwplotka bwplotka added enhancement New feature or request help wanted Extra attention is needed labels Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants