Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traefik v3.0 breaks existing cert-manager integrations #10702

Open
2 tasks done
sebadob opened this issue May 7, 2024 · 4 comments
Open
2 tasks done

Traefik v3.0 breaks existing cert-manager integrations #10702

sebadob opened this issue May 7, 2024 · 4 comments
Assignees
Labels
area/acme area/rules kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed.

Comments

@sebadob
Copy link

sebadob commented May 7, 2024

Welcome!

  • Yes, I've searched similar issues on GitHub and didn't find any.
  • Yes, I've searched similar issues on the Traefik community forum and didn't find any.

What did you do?

I was using Traefik v2 for a long time together with cert-manager inside Kubernetes.

For my usual routes, I am using the IngressRoute CRD which works perfectly fine again after I followed the migration guide.
However, all my existing cert-manager integrations and automatic certificate renewals in the whole cluster stopped working silently. The acme-solvers created by the cert-manager are default Kubernetes Ingress and they have not changed at all. I just noticed this today after a certificate could not be renewed for a very long time and gut stuck.

The Ingress they create is nothing special and as mentioned this has not changed, I have not done any other updates to the cluster than Traefik v3.0 since this problem came up.

What did you see instead?

It seems that the Ingress route is simply ignored. I tried by removing all existing IngressRoutes in the same namespace and only leaving the auto-created Ingress for a testing certificate. I tried to manually debug the whole flow and see where it gets stuck and it was for sure Traefik v3.0.
I am using the Helm Chart to deploy Traefik and providers.kubernetesIngress.enabled is set to true and is the default value in there.

The auto-created spec looks like this:

spec:
  rules:
  - host: host.example.com
    http:
      paths:
      - backend:
          service:
            name: cm-acme-http-solver-qrg6m
            port:
              number: 8089
        path: /.well-known/acme-challenge/als2om6EN1LVdx548XZESnj9DkuHh3idOaexEcaFkmI
        pathType: ImplementationSpecific

I usually have https-redirects for each host name and as soon as I added the IngressRoute again, I got back the HTTP 301 from it, even though I should have gotten the acme challenge. But as mentioned even without this redirect route I alway received a 404 page not found.

I am using a very nasty workaround everywhere now so I can at least get my certificates renewed. This is simply ignoring the auto-created Ingres resources and I manually added the following IngressRoute definitions to at least make it working again:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: https-redirect
spec:
  entryPoints:
    - web
  routes:
    - match: Host(`host.example.com`) && !PathPrefix(`/.well-known/acme-challenge/`)
      kind: Rule
      middlewares:
        - name: https-only
          namespace: traefik
      services:
        - name: some-service
          port: 3000
    - match: Host(`host.example.com`) && PathPrefix(`/.well-known/acme-challenge/`)
      kind: Rule
      services:
        - name: acme-svc-workaround
          port: 8089
---
apiVersion: v1
kind: Service
metadata:
  name: acme-svc-workaround
spec:
  selector:
    acme.cert-manager.io/http01-solver: "true"
  ports:
    - name: http
      port: 8089
      targetPort: 8089

This solution is very brittle and tedious though.
I was very lucky that the failed certificate over night was on some staging environment instead of production.

What version of Traefik are you using?

Helm Chart version:
traefik-28.0.0

App-Version:
3.0.0

What is your environment & configuration?

Custom values.yaml:

globalArguments:
  - --global.checknewversion
  - --accesslog.fields.names.StartUTC=drop

deployment:
  kind: DaemonSet
  minReadySeconds: 3

podDisruptionBudget:
  enabled: true
  maxUnavailable: 2

ingressRoute:
  dashboard:
    enabled: true

providers:
  kubernetesCRD:
    enabled: true
    allowCrossNamespace: true
    allowExternalNameServices: true
    allowEmptyServices: true

env:
  - name: TZ
    value: Europe/Berlin

logs:
  general:
    level: INFO
  access:
    enabled: true
    bufferingSize: 100
    filters:
      statuscodes: "300-599"
      minduration: 1ms
    fields:
      headers:
        defaultmode: drop
        names:
          User-Agent: keep

resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    cpu: 1000m
    memory: 512Mi

service:
  enabled: true
  type: LoadBalancer
  spec:
    externalTrafficPolicy: Local

ports:
  websecure:
    asDefault: true
    http3:
      enabled: true

podAntiAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
          - key: app
            operator: In
            values:
              - traefik
      topologyKey: failure-domain.beta.kubernetes.io/zone

If applicable, please paste the log output in DEBUG level

No response

@kevinpollet kevinpollet self-assigned this May 13, 2024
@rtribotte rtribotte assigned rtribotte and unassigned kevinpollet May 13, 2024
@rtribotte rtribotte added area/acme kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed. area/rules and removed status/0-needs-triage labels May 13, 2024
@kevinpollet kevinpollet self-assigned this May 13, 2024
@stellirin
Copy link

I have the same setup with Cert Manager and I also see this behaviour in the newly released 3.0.1, I saw it since at least 3.0.0-rc3. I see 404 in the Traefik logs on requests to /.well-known/acme-challenge/.* paths. My workaround is similar, I copy the generated rule to my IngressRoute. Fortunately I have only a handful of certificates to manage.

@sebadob
Copy link
Author

sebadob commented May 26, 2024

I don't think it's related to cert-manager at all, just to the default ingress class for kubernetes.
I just have this problem with cert-manager, because it creates the default ingress instead of the IngressRoute CRD.

@jeansebastienh
Copy link

jeansebastienh commented May 27, 2024

Yes that's an issue with IngressRoute.

As Traefik v3 changes the apiVersion from traefik.containo.us/v1alpha1 to traefik.io/v1alpha1. Previous IngressRoutes are ignored.

I wrote a little migration script (note: I'm considering here that the IR and the namespace name are the same)

# Create a new IR with the proper API
for irname in $(kubectl get ingressroutes.traefik.containo.us --all-namespaces -o json | jq '.items | .[] | .metadata.name' -r); do kubectl -n ${irname} get ingressroute ${irname} -oyaml  | grep -v -e "uid:" -e "resourceVersion:" -e "generation:" -e "creationTimestamp" | sed 's/.containo.us/.io/' | kubectl -n ${irname} apply -f -; done
# Delete the previous one
for irname in $(kubectl get ingressroutes.traefik.containo.us --all-namespaces -o json | jq '.items | .[] | .metadata.name' -r); do kubectl -n ${irname} delete ingressroutes.traefik.containo.us ${irname}; done

EDIT: This impact all others resources (Middleware, IngressRouteTCP...) not only Ingressroute.
The following command can help listing the resources using the previous api:

for res in $(kubectl api-resources --api-group=traefik.containo.us | awk '{ print $1 }'); do  echo "== ${res}.traefik.containo.us =="; kubectl get $res.traefik.containo.us --all-namespaces; done

@sebadob
Copy link
Author

sebadob commented May 28, 2024

Yes that's an issue with IngressRoute.

No it's not, all the IngressRoutes work fine after the migration. It's the default Ingress class not being taken into account for the routing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/acme area/rules kind/bug/possible a possible bug that needs analysis before it is confirmed or fixed.
Projects
None yet
Development

No branches or pull requests

6 participants