False-positive "Container has the same readiness and liveness probe" if `failureThreshold` differs #339

AndiDog · 2021-01-04T19:12:23Z

Which version of kube-score are you using?

kube-score version: 1.10.0, commit: 95faa2a, built: 2020-11-07T14:17:50Z

What did you do?

readinessProbe:
  tcpSocket:
    port: http
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 3
livenessProbe:
  tcpSocket:
    port: http
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 12

in a Deployment.

What did you expect to see?

No error for this part because failureThreshold is a very valid approach to differentiate a liveness probe, for example to kill a container if it's not ready for an extended period of time.

What did you see instead?

[CRITICAL] Pod Probes
[...]
Container has the same readiness and liveness probe

The text was updated successfully, but these errors were encountered:

zegl · 2021-01-05T19:32:32Z

Hey,

As I see it, this is by design, to avoid cascading failures, read README_PROBES.md for more information about why this exists.

If you don't agree with this, and have examples of why you think that kube-scores recommendations can be harmful, please post them here as a comment, and we can have a chat about it.

AndiDog · 2021-01-05T21:05:46Z

Indeed I'd ask my above production example to be treated as fine. failureThreshold: 3 in the readiness probe makes the pod become passive if no traffic can be received right now (in this case if the port isn't open yet/anymore), while the liveness probe with failureThreshold: 12 takes care to kill the container in case it couldn't become ready again after some longer time window. This has proven reasonable for applications which are generally expected to never get stuck, and still should have some sane probe pattern attached given that no regular failure scenario has been observed. If the pod ever gets stuck (e.g. deadlock or similar), the relevant containers get restarted after some time. This solves 1 out of (typically) 3 pods in a deployment remaining stuck indefinitely. In my example, I am for example talking about a simple FastCGI server container which only provides that port for probing. Forcing use of a "different" probe can lead to worse scenarios – for example a developer could think "if I need to use a different type of probe, let's test if the external database is reachable" which in turn creates much higher risk of failure circuits.

zegl · 2021-05-07T17:31:10Z

Ok, just saw that you submitted a PR that changes the behaviour of Pod Probes, and I'm not convinced.

The reasoning behind why kube-score tries to make sure that different probes are used, is to make sure that the user understands what they are doing, to avoid making a misconfiguration that leads to unnecessary downtime.

Having two probes, with only a difference in either initialDelaySeconds, periodSeconds, or failureThreshold effectively means that the only difference in behaviour is how the probe deals with time. And this is not what you want to do.

A key difference between a Readiness and Liveness Probe is how the Readiness Probe is used when safely draining connections from a Pod, to make sure that it's unregistered from all Services/Endpoints/LBs/etc before the service is stopped. It's fairly normal to implement a 120s (or more) draining period after receiving SIGTERM, where the Readiness Probe is failing, but the service will still happily keep receiving new requests, until all clients have reacted to the Pod shutting down (by polling the DNS records, or similar). This draining however, can not happen on the Liveness Probe, as restarting the container in the middle of draining connections defeats its purpose.

kube-score tries to make it as easy as possible to it's users to spot this type of misconfiguration. Configuring an application to have two different probes, for the two different purposes, is never hard to do, and will make the user spend some time reflecting on their probes, and what the consequences can be it they are misconfigured. And if they can't implement a new probe (if you're running some third party software), this check is easy to be disabled on a per-application basis, with an annotation (but it hopefully still made you think!).

I might be wrong, but I'd like to see some examples of where having different time configurations has been the right thing to do, before taking the time to review your PR.

Thanks,
Gustav

greenmaid · 2021-05-27T09:23:08Z

Hello,

In fact there is a link in README_PROBES.md to Liveness probes are dangerous, srcco.de. I read here :

if you use Liveness Probe, don’t set the same specification for Liveness and Readiness Probe
you can use a Liveness Probe with the same health check, but a higher failureThreshold (e.g. mark as not-ready after 3 attempts and fail Liveness Probe after 10 attempts)

AndiDog · 2021-08-28T19:49:49Z

Yes @greenmaid, that's exactly the main use case. So @zegl any concerns around that?

vrusinov · 2022-02-21T23:54:09Z

And if they can't implement a new probe (if you're running some third party software), this check is easy to be disabled on a per-application basis, with an annotation

The problem with this is there's a single pod-probes which disables all probe checks. I'd like to be able to ignore just the "has the same readiness and liveness probe" part.

rparree · 2022-08-12T06:36:02Z

Any developments on this? I agree with @AndiDog and would like to "accept" same probe but more lenient in time. Just as the documentation explains it.

bgoareguer · 2023-04-13T15:15:48Z

The Kubernetes documentation on probes indicates that "A common pattern for liveness probes is to use the same low-cost HTTP endpoint as for readiness probes, but with a higher failureThreshold".

So I would also agree on updating kube-score accordingly.

mbyio · 2024-03-06T07:45:25Z

In my use case, we have a service that sometimes gets stuck and stops responding to HTTP. I want to use the liveness probe to reboot automatically when this happens. I can't modify the code for legal reasons so I can't change the HTTP path etc. In my case, it makes sense to use the same probe for readiness and liveness, and set the liveness to a higher failure threshold.

Sure, it is more clear if I could have different paths/probes. It would also be better if the app didn't get stuck. But I don't think that is related to the actual Kubernetes manifest, which is what kube-score is checking. That's the main reason I feel like this check should be updated.

I agree with the commenters - if the liveness probe has a higher failure threshold than readiness, then it should be pretty safe, and it should be allowed by default. I could see some people wanting to opt-in to an even stricter check though, that's fine as long as it is optional.

derek-gfs · 2024-04-08T16:14:21Z

This should really be changed since it contradicts the Kubernetes documentation the way it stands today.

AndiDog mentioned this issue May 7, 2021

Accept if readiness/liveness probes are the same but differ in threshold #366

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

False-positive "Container has the same readiness and liveness probe" if `failureThreshold` differs #339

False-positive "Container has the same readiness and liveness probe" if `failureThreshold` differs #339

AndiDog commented Jan 4, 2021

zegl commented Jan 5, 2021

AndiDog commented Jan 5, 2021

zegl commented May 7, 2021 •

edited

greenmaid commented May 27, 2021

AndiDog commented Aug 28, 2021

vrusinov commented Feb 21, 2022 •

edited

rparree commented Aug 12, 2022

bgoareguer commented Apr 13, 2023

mbyio commented Mar 6, 2024

derek-gfs commented Apr 8, 2024

False-positive "Container has the same readiness and liveness probe" if failureThreshold differs #339

False-positive "Container has the same readiness and liveness probe" if failureThreshold differs #339

Comments

AndiDog commented Jan 4, 2021

zegl commented Jan 5, 2021

AndiDog commented Jan 5, 2021

zegl commented May 7, 2021 • edited

greenmaid commented May 27, 2021

AndiDog commented Aug 28, 2021

vrusinov commented Feb 21, 2022 • edited

rparree commented Aug 12, 2022

bgoareguer commented Apr 13, 2023

mbyio commented Mar 6, 2024

derek-gfs commented Apr 8, 2024

False-positive "Container has the same readiness and liveness probe" if `failureThreshold` differs #339

False-positive "Container has the same readiness and liveness probe" if `failureThreshold` differs #339

zegl commented May 7, 2021 •

edited

vrusinov commented Feb 21, 2022 •

edited