-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod injection fails with Istio 1.22.0 but succeeds with Istio 1.21.2 #51071
Comments
Someone reported the failureThreshold issue on 1.20 the other day: https://istio.slack.com/archives/C37A4KAAD/p1715623761368409
Can you maybe show your full values.yaml or the I'll try to repro with the hub set as well |
Thanks for the response! kind create cluster
helm install istio-base istio/base --version 1.21.2 --namespace istio-system --create-namespace --skip-crds --set base.enableCRDTemplates=true
helm install istiod istio/istiod --version 1.21.2 --namespace istio-system --create-namespace
helm upgrade --install istio-base istio/base --version 1.22.0 -n istio-system --skip-crds --set base.enableCRDTemplates=true
helm upgrade --install istiod istio/istiod --version 1.22.0 -n istio-system
kubectl run nginx --image=nginx --restart=Never So it must be some overridden value that makes the upgrade fail. I'll do a bit of binary searching and report back which value is responsible. |
It's definitely caused by some Helm value override in this (which we had been using successfully so far with Istio 1.21.2): sidecarInjectorWebhook:
enableNamespacesByDefault: true
meshConfig:
accessLogFormat: |
{
"authority": "%REQ(:AUTHORITY)%",
"bytes_received": "%BYTES_RECEIVED%",
"bytes_sent": "%BYTES_SENT%",
"downstream_local_address": "%DOWNSTREAM_LOCAL_ADDRESS%",
"downstream_peer_cert_v_end": "%DOWNSTREAM_PEER_CERT_V_END%",
"downstream_peer_cert_v_start": "%DOWNSTREAM_PEER_CERT_V_START%",
"downstream_remote_address": "%DOWNSTREAM_REMOTE_ADDRESS%",
"downstream_tls_cipher": "%DOWNSTREAM_TLS_CIPHER%",
"downstream_tls_version": "%DOWNSTREAM_TLS_VERSION%",
"x_envoy_external_address": "%REQ(X-ENVOY-EXTERNAL-ADDRESS)%",
"duration": "%DURATION%",
"hostname": "%HOSTNAME%",
"istio_policy_status": "%DYNAMIC_METADATA(istio.mixer:status)%",
"method": "%REQ(:METHOD)%",
"path": "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%",
"protocol": "%PROTOCOL%",
"request_duration": "%REQUEST_DURATION%",
"request_id": "%REQ(X-REQUEST-ID)%",
"requested_server_name": "%REQUESTED_SERVER_NAME%",
"status": "%RESPONSE_CODE%",
"status_details": "%RESPONSE_CODE_DETAILS%",
"response_duration": "%RESPONSE_DURATION%",
"response_tx_duration": "%RESPONSE_TX_DURATION%",
"response_flags": "%RESPONSE_FLAGS%",
"route_name": "%ROUTE_NAME%",
"start_time": "%START_TIME%",
"upstream_cluster": "%UPSTREAM_CLUSTER%",
"upstream_host": "%UPSTREAM_HOST%",
"upstream_local_address": "%UPSTREAM_LOCAL_ADDRESS%",
"upstream_service_time": "%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)%",
"upstream_transport_failure_reason": "%UPSTREAM_TRANSPORT_FAILURE_REASON%",
"upstream_request_attempt_count": "%UPSTREAM_REQUEST_ATTEMPT_COUNT%",
"user_agent": "%REQ(USER-AGENT)%",
"x_forwarded_for": "%REQ(X-FORWARDED-FOR)%",
"x_forwarded_host": "%REQ(X-FORWARDED-HOST)%",
"x_forwarded_proto": "%REQ(X-FORWARDED-PROTO)%",
"x_forwarded_port": "%REQ(X-FORWARDED-PORT)%",
"x_b3_traceid": "%REQ(X-B3-TRACEID)%",
"x_b3_spanid": "%REQ(X-B3-SPANID)%",
"x_b3_sampled": "%REQ(X-B3-SAMPLED)%",
"x_b3_parentspanid": "%REQ(X-B3-PARENTSPANID)%",
"traceparent": "%REQ(TRACEPARENT)%",
"tracestate": "%REQ(TRACESTATE)%",
"request_cache_control": "%REQ(CACHE-CONTROL)%",
"response_cache_control": "%RESP(CACHE-CONTROL)%",
"etag": "%RESP(ETAG)%",
"expires": "%RESP(EXPIRES)%",
"age": "%RESP(AGE)%",
"if_match": "%REQ(IF-MATCH)%",
"if_none_match": "%REQ(IF-NONE-MATCH)%",
"if_modified_since": "%REQ(IF-MODIFIED-SINCE)%",
"if_unmodified_since": "%REQ(IF-UNMODIFIED-SINCE)%",
"if_range": "%REQ(IF-RANGE)%",
"vary": "%RESP(VARY)%",
"date": "%RESP(DATE)%",
"last_modified": "%RESP(LAST-MODIFIED)%",
"x_varnish": "%RESP(X-VARNISH)%",
"x_cache": "%RESP(X-CACHE)%",
"cache_status": "%RESP(CACHE-STATUS)%",
"is_lighthouse": "%REQ(IS-LIGHTHOUSE)%",
"referer": "%REQ(REFERER)%",
"accept_encoding": "%REQ(ACCEPT-ENCODING)%",
"content_encoding": "%RESP(CONTENT-ENCODING)%",
"content_length": "%RESP(CONTENT-LENGTH)%",
"content_type": "%RESP(CONTENT-TYPE)%",
"x_robots_tag": "%RESP(X-ROBOTS-TAG)%",
"range": "%REQ(RANGE)%",
"accept_ranges": "%RESP(ACCEPT-RANGES)%",
"content_range": "%RESP(CONTENT-RANGE)%",
"request_connection": "%REQ(CONNECTION)%",
"response_connection": "%RESP(CONNECTION)%",
"origin": "%REQ(ORIGIN)%",
"access_control_request_headers": "%REQ(ACCESS-CONTROL-REQUEST-HEADERS)%",
"access_control_request_method": "%REQ(ACCESS-CONTROL-REQUEST-METHOD)%",
"access_control_allow_origin": "%RESP(ACCESS-CONTROL-ALLOW-ORIGIN)%",
"access_control_allow_methods": "%RESP(ACCESS-CONTROL-ALLOW-METHODS)%",
"access_control_allow_headers": "%RESP(ACCESS-CONTROL-ALLOW-HEADERS)%",
"access_control_max_age": "%RESP(ACCESS-CONTROL-MAX-AGE)%"
}
enablePrometheusMerge: false
accessLogEncoding: JSON
accessLogFile: "/dev/stdout"
disableEnvoyListenerLog: false
enableTracing: false
dnsRefreshRate: 300s
defaultConfig:
proxyMetadata:
# Search for EXIT_ON_ZERO_ACTIVE_CONNECTIONS on:
# https://istio.io/latest/docs/reference/commands/pilot-agent/#envvars
# Also, see: https://github.com/istio/istio/issues/34855
EXIT_ON_ZERO_ACTIVE_CONNECTIONS: 'true'
# The duration that Envoy will wait for draining existing connections
# and completing existing requests after receiving a SIGTERM or SIGINT
# before shutting down.
# See: https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/#ProxyConfig
terminationDrainDuration: 15s
global:
variant: distroless # <- use smaller and safer distroless images!
logAsJson: true
defaultPodDisruptionBudget:
enabled: true
proxy:
# Delay application startup until the Istio proxy sidecar is ready.
# This avoids unnecessary errors and container restarts when outbound
# network connectivity has not been established by the Istio proxy sidecar.
# See: https://github.com/istio/istio/pull/24737
holdApplicationUntilProxyStarts: true
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: "2"
memory: 512Mi
pilot:
autoscaleEnabled: true
autoscaleMin: 4 # <- for higher availability in case of a node shutdown
autoscaleMax: 6
topologySpreadConstraints:
# Spread across nodes
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: istiod
istio: pilot
# Spread across zones
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: istiod
istio: pilot
resources:
requests:
cpu: 200m
memory: 256Mi With this values.yaml applied to the istiod install (and upgrade), the following kubectl pod creation fails with a (new) interesting error:
I tested this with: kind create cluster
helm install istio-base istio/base --version 1.21.2 --namespace istio-system --create-namespace --skip-crds --set base.enableCRDTemplates=true
helm install istiod istio/istiod --version 1.21.2 --namespace istio-system --create-namespace -f values-istiod.yaml
helm upgrade --install istio-base istio/base --version 1.22.0 -n istio-system --skip-crds --set base.enableCRDTemplates=true
helm upgrade --install istiod istio/istiod --version 1.22.0 -n istio-system -f values-istiod.yaml
kubectl run nginx --image=nginx --restart=Never |
Missing InboundTrafficPolicyMode sounds like a mismatch between the istiod image version and the istio-sidecar-injector contents |
Okay, I am sorry, now I cannot reproduce my last case anymore, myself. |
Okay, I got it. It is this Helm values override: istio_cni:
enabled: false
pilot:
cni: false # <-- new in 1.22.0 (according to release notes) The With this Helm override (which was also in place) we get this error now consistently for new pods:
|
Oh... and it was at this moment, that I realized, that it probably should have been: pilot:
cni:
enabled: false and not just: pilot:
cni: false EDIT: I just made a huge mistake when following the upgrade notes: https://istio.io/latest/news/releases/1.22.x/announcing-1.22/change-notes/#deprecation-notices (those are clear, but I followed them wrong!) |
Yepp... That was it... oh my god... :) |
This seems like a terrible error message for that mistake. I am going to re-open this to see if we can improve. Appreciate all the details here! |
Oof.
then this failure. Similar problem as #48991 |
Fixes istio#51071 Not ideal fix, longer term is in istio#48991
Fixes istio#51071 Not ideal fix, longer term is in istio#48991
Is this the right place to submit this?
Bug Description
After upgrading to Istio 1.22.0 (with the istiod Helm chart) from version 1.21.2, we observe that pod injection for gateways does not work anymore and is aborted with:
We are overriding the
.Values.global.hub
to point to a GCP Artifact Registry Remote Repository (to offload Docker image downloading to a private GCP Artifact Registry instance). The concrete value is:europe-west3-docker.pkg.dev/our-gcp-project/docker-hub-remote/istio
.We are also specifying
.Values.global.variant
todistroless
.EDIT: The overriding of
.Values.global.hub
or.Values.global.variant
does not seem to cause this. The same error(s) happen when not overriding these.Also pod injection for sidecars fails with:
even though some containers do not have any readinessProbes defined while others have proper integer readinessProbes.failureThreshold defined.
This setup works with Istio 1.21.2, but fails with above mentioned error message at a new pod creation under Istio 1.22.0.
Istiod itself does start properly (also with the right image registry, repository and tag), but no workload pod can start due to failed injection.
I also reproduced this locally with a kind cluster (no special configuration overrides), just kind create cluster.
Then:
After I've done this, I could not create any pods anymore, resulting in the same errors mentioned above.
Also following the "in-place" Istio with Helm upgrade procedure yields the same errors: https://istio.io/latest/docs/setup/upgrade/helm/#in-place-upgrade
Final Analysis
The actual cause of the error messages was the following Helm value override:
which should have been:
(see comments below for full story)
Version
Additional Information
No response
The text was updated successfully, but these errors were encountered: