Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persisting silences in alertmanager #685

Open
marwanad opened this issue Nov 28, 2023 · 9 comments
Open

Persisting silences in alertmanager #685

marwanad opened this issue Nov 28, 2023 · 9 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@marwanad
Copy link

In the managed alertmanager, alertmanager-data is of EmptyDir which means that configured silences and notification states won't persist on pod restarts. Is there a way to have a configurable PVC for the data dir with the managed alertmanager?

@bwplotka
Copy link
Collaborator

That's correct, thanks for raising this.

Alertmanager is a statefulset, but with best-effort emptyDir volume which does not guarantee any persistence. In self-deployment that's possible, since you can modify the Alertmanager resource, but not in managed GMP.

We could discuss this feature as a team if you want, it feels like something we could consider, but of lower priority. Also help wanted to contribute this feature, might get it done faster.

Just curious what's your use case for managed alertmanager? Would our recent cloud feature in preview PromQL for Cloud Monitoring Alerting help?

@bwplotka bwplotka added enhancement New feature or request help wanted Extra attention is needed labels Nov 29, 2023
@marwanad
Copy link
Author

@bwplotka thanks for the response! I think there was no way to disable the deployment of the managed alertmanager through the GMP operator at the time so we ended up utilizing it instead of having duplicate deployments.

So it's basically the same use-case for an unmanaged alertmanager, at the time we couldn't define PromQL rules in cloud monitoring + we needed more control over the slack notification channel configs, pagerduty etc. The preview feature looks interesting and covers a subset of our use-case but we'll still need alertmanager for generic webhook channels.

@lyanco
Copy link
Collaborator

lyanco commented Nov 30, 2023

Note that Cloud Alerting PromQL does support generic webhook channels: https://cloud.google.com/monitoring/support/notification-options#webhooks

@taldejoh
Copy link

We are facing the same problem. All of our silences are gone on pod restart and we need to recreate all of them manually.
In the last two weeks it happened two times.
So this improvement would also be very helpful for us!

@bwplotka
Copy link
Collaborator

Sorry for lag, it's on our radar again, we are brainstorming how to enable persistent volumes here.

Interestingly there is a very nasty "persistent" workaround for silences in the meantime prometheus/alertmanager#1673 (comment) (thanks @TheSpiritXIII for the finding!)

@bwplotka
Copy link
Collaborator

bwplotka commented Apr 8, 2024

Just quick question to users who care about this feature, which managed collection (this operator) deployment model you use?

1️⃣ the one available on GKE (fully managed). If that's the case, how you submit the silences?
2️⃣ self-deployed operator (via kubectl). If that's the case, what stops you from manually adjusting Alertmanager Statefulset YAML for your needs and re-applying it? Operator will managed that one (as long as you keep the labels, namespace and name the same) just fine.

cc @m3adow @marwanad @taldejoh

@marwanad
Copy link
Author

marwanad commented Apr 8, 2024

@bwplotka appreciate the updates on this :)

We were using option 1 and setting the silences by port-forwarding to the running alertmanager instance and adding them through the UI or using amtool to submit them.

We've then switched to a self deployed alertmanager instance to get more control over this and setting alertmanagers field in the operator config to point to our self-managed instance.

@m3adow
Copy link

m3adow commented Apr 9, 2024

We're using option 1 as well. We're currently in the process of migrating from kube-prometheus-stack to GMP and we want to have as much of the "GM", as possible. 😄
Right now, we're also using port-forwarding and the UI to silence alerts. As the alerts are sent to Teams channels, we don't have an option to silence the alerts later on in the alerting chain.

@bwplotka
Copy link
Collaborator

bwplotka commented Apr 9, 2024

Epic, thanks for clarifications!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants