Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

containerized protobuf codegen does not handle .go-version / GOTOOLCHAIN properly #124932

Open
liggitt opened this issue May 17, 2024 · 19 comments
Open
Assignees
Labels
area/code-organization Issues or PRs related to kubernetes code organization kind/bug Categorizes issue or PR as related to a bug. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/testing Categorizes an issue or PR as relevant to SIG Testing. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@liggitt
Copy link
Member

liggitt commented May 17, 2024

Seen in #124922

+++ command: bash "hack/make-rules/../../hack/verify-codegen.sh"
go version go1.22.3 linux/amd64
+++ [0517 06:22:42] Generating protobufs for 67 targets
+++ [0517 06:22:42] protoc 23.4 not found (can install with hack/install-protoc.sh); generating containerized...
+++ [0517 06:22:42] Verifying Prerequisites....
+++ [0517 06:22:43] Building Docker image kube-build:build-71fc6bd7d6-5-v1.28.0-go1.21.10-bullseye.0
+++ [0517 06:24:57] Creating data container kube-build-data-71fc6bd7d6-5-v1.28.0-go1.21.10-bullseye.0
+++ [0517 06:24:58] Syncing sources to container
+++ [0517 06:25:03] Output from this container will be rsynced out upon completion. Set KUBE_RUN_COPY_OUTPUT=n to disable.
+++ [0517 06:25:03] Running build command...
go: downloading go1.22.3 (linux/amd64)
go: download go1.22.3 for linux/amd64: toolchain not available
!!! [0517 06:25:04] Call tree:
!!! [0517 06:25:04]  1: build/../build/common.sh:489 kube::build::run_build_command_ex(...)
!!! [0517 06:25:04]  2: build/run.sh:39 kube::build::run_build_command(...)
!!! [0517 06:25:04] Call tree:
!!! [0517 06:25:04]  1: hack/update-codegen.sh:919 codegen::protobuf(...)
+++ exit code: 1
+++ error: 1
�[0;31mFAILED�[0m   verify-codegen.sh	148s

Something is not working properly between .go-version → GOTOOLCHAIN when the go version inside this specific container doesn't match

/assign
/cc @BenTheElder @MadhavJivrajani

@liggitt
Copy link
Member Author

liggitt commented May 17, 2024

/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 17, 2024
@MadhavJivrajani
Copy link
Contributor

/triage accepted
/sig architecture testing
/area code-organization

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/testing Categorizes an issue or PR as relevant to SIG Testing. area/code-organization Issues or PRs related to kubernetes code organization and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 17, 2024
@MadhavJivrajani
Copy link
Contributor

MadhavJivrajani commented May 17, 2024

go: download go1.22.3 for linux/amd64: toolchain not available

This is an error I've seen when we use go 1.x and not go 1.x.y as the go directive, some instances: golang/go#62278, dependabot/dependabot-core#7895

Which is why what you said here makes sense: #124922 (comment). Once the actual bump happens, GOTOOLCHAIN env variable that will propagate into the container will match with the toolchain that exists already and go will not try and download it again leading to the error (similar to the issues linked).

This seems to also be the reason the verify job failed here: #122410
When I had started the testing, master was still not on go1.22 leading to the kube-build go toolchain mismatch.

@MadhavJivrajani
Copy link
Contributor

MadhavJivrajani commented May 17, 2024

I wonder if it would be a good idea to substitute the value of .go-version directly in the kube-build script

But I guess the long term solution would be to incorporate the toolchain directive into our go mods and the tooling.

@BenTheElder
Copy link
Member

But I guess the long term solution would be to incorporate the toolchain directive into our go mods and the tooling.

This also has the benefit of enabling IDEs to respect it without our makefiles (though your builds still won't match us if you just go build), but we gain additional complexity making these align and we need to continue to respect .go-version / envs properly for now because downstream users depend on this "API"

@BenTheElder
Copy link
Member

Even if we switch to toolchain in go.mod we should probably make the env / .go-version work properly for at least one minor while putting a release note that downstreams must switch. (I'm sure an argument could be made for longer).

It would probably be preferable to keep them working permanently because nobody wants to carry go.mod patches but it is common to need to force a different go version for security patching etc.

@BenTheElder
Copy link
Member

Do we know yet if this applies to all branches?

@liggitt
Copy link
Member Author

liggitt commented May 20, 2024

Do we know yet if this applies to all branches?

yes, it applies to all branches (we saw this when trying to just bump .go-version to 1.22.x on release branches)

@BenTheElder
Copy link
Member

/assign

@BenTheElder
Copy link
Member

So, interestingly if I do GOTOOLCHAIN=go1.21.9 make verify WHAT=codegen, we get the same failure.

go: downloading go1.21.9 (linux/amd64)
go: download go1.21.9 for linux/amd64: toolchain not available

Seems like something is broken with GOTOOLCHAIN support in this go install or go, rather than our scripts?

@BenTheElder
Copy link
Member

Maybe something with the kube-build image.

@BenTheElder
Copy link
Member

Normal go image:

$ docker run --rm -e GOTOOLCHAIN=go1.22.3 golang:1.21.10-bullseye go version
Unable to find image 'golang:1.21.10-bullseye' locally
1.21.10-bullseye: Pulling from library/golang
3d53ef4019fc: Pull complete 
08f0bf643eb6: Pull complete 
6b037c2b46ab: Pull complete 
c9c00c8cd734: Pull complete 
a4de642a3616: Pull complete 
785d32889ff9: Pull complete 
4f4fb700ef54: Pull complete 
Digest: sha256:6815d296f20c3cb3afa67bf5dc862a831b6080f5dd8b5b423bb5f862539299de
Status: Downloaded newer image for golang:1.21.10-bullseye
go: downloading go1.22.3 (linux/amd64)
go version go1.22.3 linux/amd64

kube-cross:

docker run --rm -e GOTOOLCHAIN=go1.22.3 registry.k8s.io/build-image/kube-cross:v1.28.0-go1.21.10-bullseye.0 go version
Unable to find image 'registry.k8s.io/build-image/kube-cross:v1.28.0-go1.21.10-bullseye.0' locally
v1.28.0-go1.21.10-bullseye.0: Pulling from build-image/kube-cross
646e886fa3cf: Pull complete 
c5a360c5f105: Pull complete 
5cbeb8ef1d90: Pull complete 
f3054eaa1068: Pull complete 
fd19ff5befab: Pull complete 
029d7cfe87db: Pull complete 
4f4fb700ef54: Pull complete 
ab038281f51a: Pull complete 
9bc87ef1bf76: Pull complete 
eb4d85f62c54: Pull complete 
00a4d6140bd2: Pull complete 
30d8f173cf20: Pull complete 
1903caad7a82: Pull complete 
e7f153876917: Pull complete 
Digest: sha256:1996e852695eb1d3b48bda278a4584317d5d42eaef226899a331ea37394b6952
Status: Downloaded newer image for registry.k8s.io/build-image/kube-cross:v1.28.0-go1.21.10-bullseye.0
go: downloading go1.22.3 (linux/amd64)
go version go1.22.3 linux/amd64

kube-build:

$ docker run --rm -e GOTOOLCHAIN=go1.22.3 kube-build:build-9a4d0b6c27-5-v1.28.0-go1.21.10-bullseye.0 go version
go: downloading go1.22.3 (linux/amd64)
go version go1.22.3 linux/amd64

but under verify-codegen with GOTOOLCHAIN=go1.22.3 make verify WHAT=codegen:

...
+++ [0520 21:58:12] Building Docker image kube-build:build-17572b7f79-5-v1.28.0-go1.21.10-bullseye.0
+++ [0520 21:58:22] Creating data container kube-build-data-17572b7f79-5-v1.28.0-go1.21.10-bullseye.0
+++ [0520 21:58:23] Syncing sources to container
+++ [0520 21:58:31] Output from this container will be rsynced out upon completion. Set KUBE_RUN_COPY_OUTPUT=n to disable.
+++ [0520 21:58:31] Running build command...
go: downloading go1.22.3 (linux/amd64)
go: download go1.22.3 for linux/amd64: toolchain not available
...

.... hmm

@BenTheElder
Copy link
Member

narrowed it down to hack/lib/init.sh:

$ GOTOOLCHAIN=go1.22.3 build/run.sh bash -c 'source hack/lib/init.sh && go version && echo $GOTOOLCHAIN'
+++ [0520 22:09:21] Verifying Prerequisites....
+++ [0520 22:09:22] Building Docker image kube-build:build-b9c550bd56-5-v1.28.0-go1.21.10-bullseye.0
+++ [0520 22:09:31] Syncing sources to container
+++ [0520 22:09:33] Output from this container will be rsynced out upon completion. Set KUBE_RUN_COPY_OUTPUT=n to disable.
+++ [0520 22:09:33] Running build command...
go version go1.21.10 linux/amd64
go1.22.3
+++ [0520 22:09:34] Syncing out of container
$ GOTOOLCHAIN=go1.22.3 build/run.sh bash -c 'go version && echo $GOTOOLCHAIN'
+++ [0520 22:09:54] Verifying Prerequisites....
+++ [0520 22:09:54] Building Docker image kube-build:build-b9c550bd56-5-v1.28.0-go1.21.10-bullseye.0
+++ [0520 22:10:04] Syncing sources to container
+++ [0520 22:10:06] Output from this container will be rsynced out upon completion. Set KUBE_RUN_COPY_OUTPUT=n to disable.
+++ [0520 22:10:06] Running build command...
go: downloading go1.22.3 (linux/amd64)
go version go1.22.3 linux/amd64
go1.22.3
+++ [0520 22:10:13] Syncing out of container

@BenTheElder
Copy link
Member

Talked to @liggitt about this, who found GO111MODULE=off suspect, setting this will seemingly cause go version to ignore GOTOOLCHAIN.

even @ 765e7ef (just merged to master an hour ago) we can see:

$ GOTOOLCHAIN=go1.22.1 build/run.sh bash -c 'GO111MODULE=off go version && echo "GOTOOLCHAIN=$GOTOOLCHAIN"'
+++ [0520 23:39:30] Verifying Prerequisites....
+++ [0520 23:39:31] Building Docker image kube-build:build-3ad27617b0-5-v1.31.0-go1.22.3-bullseye.0
+++ [0520 23:41:10] Deleting image kube-build:build-3ad27617b0-5-v1.30.0-go1.22.2-bullseye.0
+++ [0520 23:41:12] Creating data container kube-build-data-3ad27617b0-5-v1.31.0-go1.22.3-bullseye.0
+++ [0520 23:41:23] Syncing sources to container
+++ [0520 23:41:36] Output from this container will be rsynced out upon completion. Set KUBE_RUN_COPY_OUTPUT=n to disable.
+++ [0520 23:41:36] Running build command...
go version go1.22.3 linux/amd64
GOTOOLCHAIN=go1.22.1
+++ [0520 23:41:37] Syncing out of container
$ GOTOOLCHAIN=go1.22.1 build/run.sh bash -c 'GO111MODULE=on go version && echo "GOTOOLCHAIN=$GOTOOLCHAIN"'
+++ [0520 23:41:54] Verifying Prerequisites....
+++ [0520 23:41:55] Building Docker image kube-build:build-3ad27617b0-5-v1.31.0-go1.22.3-bullseye.0
+++ [0520 23:42:04] Syncing sources to container
+++ [0520 23:42:06] Output from this container will be rsynced out upon completion. Set KUBE_RUN_COPY_OUTPUT=n to disable.
+++ [0520 23:42:06] Running build command...
go: downloading go1.22.1 (linux/amd64)
go version go1.22.1 linux/amd64
GOTOOLCHAIN=go1.22.1
+++ [0520 23:42:14] Syncing out of container

So this is probably related.

On master GOTOOLCHAIN=go1.22.1 make verify WHAT=codege is not broken, presumably because we don't set GO111MODULE=off anymore after @thockin's WORKSPACES work.

I think we can say this is already fixed @ HEAD and is just poorly behaved on the older branches.

We should probably hack in a gimme fallback for this case on older branches while doing our best to respect GOTOOLCHAIN after introducing it (IIRC we put a release note about using GOTOOLCHAIN to control go versions). We already use gimme if go is not available or too old.

@BenTheElder
Copy link
Member

Might not be worth prioritizing a patch given this is only relevant if you're:

  • trying to override the go version
  • not overriding the build container image to match
  • running on an old release branch

This should be fine in 1.30+ *

*well, there's a smaller bug in #125010

@MadhavJivrajani
Copy link
Contributor

Thanks for digging here @BenTheElder!

@MadhavJivrajani
Copy link
Contributor

Hmm, this is slightly confusing, on darwin:

❯ go version
go version go1.22.2 darwin/amd64

and with GO111MODULE=off

❯ GOTOOLCHAIN=go1.21.9 GO111MODULE=off go version
go version go1.21.0 darwin/amd64

I'm not sure why its go1.21.0, I don't have that toolchain present locally either...

with GO111MODULE=on:

❯ GOTOOLCHAIN=go1.21.9 GO111MODULE=on go version
go version go1.21.9 darwin/amd64

@MadhavJivrajani
Copy link
Contributor

MadhavJivrajani commented May 21, 2024

Ah,
I see where the effect of GO111MODULE on GOTOOLCHAIN comes from: https://github.com/golang/go/blob/dddf0ae40fa0c1223aba191d73a44425a08e1035/src/cmd/go/internal/toolchain/select.go#L93

@BenTheElder
Copy link
Member

Right, so for the newer branches where we have modules on and workspaces adopted, that just leaves #125010

We could try to mitigate this in the older branches ... but it's a fairly niche issue and the workspaces change is too large to backport so we'll have to write something custom for the old branches ... and you can already work around it by just overriding the build image to have the right go versoin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/code-organization Issues or PRs related to kubernetes code organization kind/bug Categorizes issue or PR as related to a bug. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/testing Categorizes an issue or PR as relevant to SIG Testing. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants