-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reintroduce GPU Usage & Efficiency #2731
Reintroduce GPU Usage & Efficiency #2731
Conversation
testing. Signed-off-by: thomasvn <thomasnguyen96@gmail.com>
response. Signed-off-by: thomasvn <thomasnguyen96@gmail.com>
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
marshalled/unmarshalled to/from bytes. Signed-off-by: thomasvn <thomasnguyen96@gmail.com>
Signed-off-by: thomasvn <thomasnguyen96@gmail.com>
Signed-off-by: thomasvn <thomasnguyen96@gmail.com>
accordance with documentation here https://github.com/opencost/opencost/blob/develop/core/pkg/opencost/bingen.go. Rerun `go generate`. Signed-off-by: thomasvn <thomasnguyen96@gmail.com>
GPURequestAverage float64 `json:"gpuRequestAverage"` //@bingen:field[version=22] | ||
GPUUsageAverage float64 `json:"gpuUsageAverage"` //@bingen:field[version=22] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these new fields to the bottom of the struct in accordance with https://github.com/opencost/opencost/blob/develop/core/pkg/opencost/bingen.go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed that the codecs performs "field version checks". Which leads me to believe that we don't have to add new fields to the end of the struct. If possible, I'd like to group these new fields alongside the other GPU fields.
Quality Gate passedIssues Measures |
Signed-off-by: thomasvn <thomasnguyen96@gmail.com>
Signed-off-by: thomasvn <thomasnguyen96@gmail.com>
Signed-off-by: thomasvn <thomasnguyen96@gmail.com>
@AjayTripathy @ameijer @mbolt35 I've finished my testing and this is ready for review. |
Any chance there are docs on deploying the dcgm exporter? |
@AjayTripathy These are the dcgm-exporter docs I referenced https://github.com/NVIDIA/dcgm-exporter. I only needed to run a single command to get up and running:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me. Just a couple of questions.
Signed-off-by: thomasvn <thomasnguyen96@gmail.com>
What does this PR change?
GPURequestAverage
andGPUUsageAverage
to bingen.DCGM_FI_DEV_GPU_UTIL
metric.TODO. This PR does not yet address TotalEfficiency. That is still calculated solely with CPU and RAM.
TODO. I think there is some future work required to further validate the usage/querying of the
DCGM_FI_DEV_GPU_UTIL
metric.Does this PR relate to any other PRs?
How will this PR impact users?
gpuRequestAverage
,gpuUsageAverage
, andgpuEfficiency
.Does this PR address any GitHub or Zendesk issues?
gpuRequestAverage
andgpuUsageAverage
to the Allocation API Schema kubecost/cost-analyzer-helm-chart#1787How was this PR tested?
Setup
http://localhost:9003/metrics
. Check to see that metrics are updated.http://locahost:9003/allocation?window=1d
. Validate my dcgmproftester deployment has newgpuRequestAverage
andgpuUsageAverage
fields. Example result here allocation.json.http://locahost:9003/allocation/summary?window=1d
. Validate GPU fields. Example result here allocationsummary.json.Does this PR require changes to documentation?
Have you labeled this PR and its corresponding Issue as "next release" if it should be part of the next OpenCost release? If not, why not?