Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster-controller restarts when trying cluster turndown. #2752

Open
mmclane opened this issue May 16, 2024 · 10 comments
Open

cluster-controller restarts when trying cluster turndown. #2752

mmclane opened this issue May 16, 2024 · 10 comments
Labels
kubecost Relevant to Kubecost's downstream project needs-follow-up needs-triage

Comments

@mmclane
Copy link

mmclane commented May 16, 2024

Describe the bug
Yesterday I installed kubecost so that I could play with the new cluster turndown feature. I have it installed and setup including the cluster-controller. I then created a TDS and that got scheduled successfully. When the start time hit I watch as it successfully created the cluster-turndown node group and saw the new node get added. But that is as far as it gets. Looking at the logs on the cluster-controller I see errors and restarts.

2024-05-15T18:25:54Z INF Determined to be running in a cluster. Using in-cluster K8s config.
2024-05-15T18:30:01Z ERR Kubescaler setup failed error="creating a Kubescaler: recommendation service unavailable: failed to execute request: Get \"http://kubecost-cost-analyzer.kubecost:9090/model/savings/requestSizingV2\": context deadline exceeded"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x23f31fe]

goroutine 1 [running]:
main.main()
	/app/cmd/clustercontroller/main.go:237 +0x53e
2024-05-15T18:30:43Z INF Determined to be running in a cluster. Using in-cluster K8s config.

I am running this on an EKS cluster.

To Reproduce
Steps to reproduce the behavior:

  1. Install kubecost
  2. Configure the
    cluster-controller-service-key secret as described int he documentation
  3. Enable cluster-controller
  4. Schedule a TurndownSchedule using a yaml file
  5. Wait for the start time to happen.

Expected behavior
The cluster should be turned down

Screenshots
See error above.

Which version of OpenCost are you using?
You can find the version from the container's startup logging or from the bottom of the page in the UI.
Helm Chart v2.2.4

Additional context
Add any other context about the problem here. Kubernetes versions and which public clouds you are working with are especially important.

@AjayTripathy
Copy link
Contributor

Thank you for reporting this @mcclane. @cliffcolvin can you please take a look and get this triaged?

@mmclane
Copy link
Author

mmclane commented May 17, 2024

I also tried this with version 2.0.2 and hand the same experience but no error on the cluster-controller

@mattray mattray added the kubecost Relevant to Kubecost's downstream project label May 23, 2024
@mattray
Copy link
Collaborator

mattray commented May 23, 2024

@cliffcolvin do you want to move this to the proper Kubecost Issues repository?

@jessegoodier
Copy link
Contributor

@mmclane thanks for this detail. we will try and reproduce.
If you have a minute, could you try this and let us know what you get (valid json or errror)?

kubectl exec -i -t -n kubecost deployments/kubecost-cost-analyzer -c cost-analyzer-frontend -- curl "http://kubecost-cost-analyzer.kubecost:9090/model/savings/requestSizingV2?window=2d"

@mmclane
Copy link
Author

mmclane commented May 23, 2024

@mmclane thanks for this detail. we will try and reproduce. If you have a minute, could you try this and let us know what you get (valid json or errror)?

kubectl exec -i -t -n kubecost deployments/kubecost-cost-analyzer -c cost-analyzer-frontend -- curl "http://kubecost-cost-analyzer.kubecost:9090/model/savings/requestSizingV2?window=2d"

I got valid JSON, a big blob of it.

@mmclane
Copy link
Author

mmclane commented May 23, 2024

I don't know if this is related but when I try to create a turndown schedule via the UI it fails. I just tried to create one just now. Its currently 10:13, I selected a start time for 10:15 with an end time of 11. I get no error when I click apply but the schedule never shows up. If run kubectl get tds I see the state is ScheduleFailed. If I describe it however it says it successfully scheduled the turndown. It doesn't say why it failed.

If I delete it and create a new one via the following manifest file it is scheduled successfully.

apiVersion: kubecost.com/v1alpha1
kind: TurndownSchedule
metadata:
  name: test-turndown
  namespace: kubecost
  finalizers:
  - "finalizer.kubecost.com"
spec:
  start: 2024-05-23T14:25:00Z
  end: 2024-05-23T15:15:00Z
  repeat: none

The schedule shows up in the UI, but shows the wrong configuration. In the UI it says it will Repeat: Daily but you can see in the manifest file that it was not set that way. If I describe the job it shows the following and says Repeat: none.

Name:         test-turndown
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  kubecost.com/v1alpha1
Kind:         TurndownSchedule
Metadata:
  Creation Timestamp:  2024-05-23T14:23:25Z
  Finalizers:
    finalizer.kubecost.com
  Generation:        1
  Resource Version:  88587297
  UID:               aefdf0fc-e27c-41b5-93c1-2d14b41ce440
Spec:
  End:     2024-05-23T15:15:00Z
  Repeat:  none
  Start:   2024-05-23T14:25:00Z
Status:
  Current:               scaledown
  Last Updated:          2024-05-23T14:23:25Z
  Next Scale Down Time:  2024-05-23T14:25:00Z
  Next Scale Up Time:    2024-05-23T15:15:00Z
  Scale Down Id:         a59fe5ec-53c9-4f63-aee4-abad3903a126
  Scale Down Metadata:
    Repeat:     none
    Type:       scaledown
  Scale Up ID:  cfbd3922-5f2a-43cd-a46c-dd9497fd02ce
  Scale Up Metadata:
    Repeat:  none
    Type:    scaleup
  State:     ScheduleSuccess
Events:
  Type    Reason                   Age   From                          Message
  ----    ------                   ----  ----                          -------
  Normal  ScheduleTurndownSuccess  116s  turndown-schedule-controller  Successfully scheduled turndown

@mmclane
Copy link
Author

mmclane commented May 23, 2024

Note the schedule no longer shows in the UI once it starts.

@dwbrown2
Copy link
Collaborator

@AjayTripathy @jessegoodier @kwombach12 any updates on this issue or kubecost/cluster-turndown#77?

@kwombach12
Copy link

We are actively investigating this issue! We are trying to understand why we seem to be getting a timeout....

@douernesto
Copy link

I'm having a similar issue with "kubecost-cluster-controller"

Kubecost version 2.2.2 (multicluster)

❯ k logs kubecost-cluster-controller-564dc8c48-qxxkx -n kubecost-dev
2024-06-05T21:16:15Z INF Determined to be running in a cluster. Using in-cluster K8s config.
2024-06-05T21:20:28Z ERR Kubescaler setup failed error="creating a Kubescaler: recommendation service unavailable: failed to execute request: Get \[http://kubecost-cost-analyzer.kubecost-dev:9090/model/savings/requestSizingV2\](http://kubecost-cost-analyzer.kubecost-dev:9090/model/savings/requestSizingV2/): context deadline exceeded"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x23f31fe]
 
goroutine 1 [running]:
main.main()
                /app/cmd/clustercontroller/main.go:237 +0x53e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kubecost Relevant to Kubecost's downstream project needs-follow-up needs-triage
Projects
None yet
Development

No branches or pull requests

7 participants