Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liveness probe for netbox #100

Open
florianschendel opened this issue May 30, 2022 · 8 comments
Open

Liveness probe for netbox #100

florianschendel opened this issue May 30, 2022 · 8 comments
Labels
enhancement New feature or request upstream The problem is in an upstream project

Comments

@florianschendel
Copy link

Hi Chris,

a liveness probe to restart the pod would be very helpful for installations.

Best regards
Florian

@bootc
Copy link
Member

bootc commented Jun 12, 2022

There's currently no URI in NetBox that's really suitable for use with a liveness probe. It looks like netbox-community/netbox#8831 might help with something like that, however.

Learnk8s.io's Kubernetes production best practices (which I tend to broadly agree with) call for passive liveness checks, so it should really call into the Python code but basically do nothing except return success. The readiness check could/should do a bit more work to be particularly useful, and the liveness and readiness checks should not be the same.

The existing readiness probe just loads the login page. That isn't ideal but it's the closest we've got to something useful in NetBox at the moment.

@bootc bootc added enhancement New feature or request upstream The problem is in an upstream project labels Jul 17, 2022
@RangerRick
Copy link
Contributor

Hey, just revisiting old issues. @bootc would the /api/status endpoint be good for this? It doesn't do nothing but it does nearly nothing.

curl --silent http://localhost:8080/api/status/ | jq
{
  "django-version": "4.2.6",
  "installed-apps": {
    "debug_toolbar": "4.2.0",
    "django_filters": "23.3",
    "django_prometheus": "2.3.1",
    "django_rq": "2.8.1",
    "django_tables2": "2.6.0",
    "drf_spectacular": "0.26.5",
    "drf_spectacular_sidecar": "2023.10.1",
    "graphene_django": "3.0.0",
    "graphiql_debug_toolbar": "0.2.0",
    "mptt": "0.14.0",
    "rest_framework": "3.14.0",
    "social_django": "5.4.0",
    "taggit": "4.0.0",
    "timezone_field": "6.0.1"
  },
  "netbox-version": "3.6.4",
  "plugins": {},
  "python-version": "3.11.4",
  "rq-workers-running": 1
}

@bootc
Copy link
Member

bootc commented May 4, 2024

As long as it doesn’t try to talk to the database or Redis it looks like it would do the trick. Ideally it would also be available without authentication for simplicity.

This blog post describes how liveness and readiness proves work: https://blog.colinbreck.com/kubernetes-liveness-and-readiness-probes-how-to-avoid-shooting-yourself-in-the-foot/

@LeoColomb
Copy link
Contributor

As per the docs, this endpoint is "A lightweight read-only endpoint for conveying NetBox's current operational status".
It sounds to match the need pretty well 😊

@RangerRick
Copy link
Contributor

RangerRick commented May 10, 2024

Looking closer at the code, it does call into the worker API among other things so maybe the /api/status bit should be a readiness probe, and we just call /api (which literally just returns a list of resources) for liveness.

@bootc
Copy link
Member

bootc commented May 10, 2024

The other thing I thought of is: does this still work if authentication is required?

@LeoColomb
Copy link
Contributor

Sane thought, and neither /api nor /api/status require auth.
They can be tried out directly on the demo NetBox online, without being logged in:

@bootc
Copy link
Member

bootc commented May 11, 2024

Well, https://netbox.boo.tc/api/ works without authentication, but https://netbox.boo.tc/api/status/ gives a 403 Forbidden. So I think /api/ (don't forget the trailing /) would probably work well for a readiness check.

Probably the best place for a liveness check is an extra endpoint within NGINX Unit in the container; and actually it looks like there's something like that on port 8181? In fact in my NetBox I can query http://${POD_IP}:8081/status/ and get some JSON back from NGINX Unit; that seems like a plausible candidate for a liveness check to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request upstream The problem is in an upstream project
Projects
None yet
Development

No branches or pull requests

4 participants