Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting OpenMetrics #189

Open
1 of 19 tasks
dmagliola opened this issue May 22, 2020 · 12 comments
Open
1 of 19 tasks

Supporting OpenMetrics #189

dmagliola opened this issue May 22, 2020 · 12 comments
Labels
breaking-change Issues/PRs that break API and require a major version release

Comments

@dmagliola
Copy link
Collaborator

dmagliola commented May 22, 2020

We should support OpenMetrics. I've started this work in a temporary branch, will start opening PRs.
But this seems to be the master list of things to do.

This is a breaking change that we should release as 3.0.
We should have previous minor versions with deprecation warnings about the things we'll be changing.

Todo

  • Basic Formatting hygiene

    • Format negotiation with OpenMetrics Accept Header
    • Add tests to the OpenMetrics formatter I wrote
    • #EOF
    • #UNIT including validation
    • forcing _total on counters. This is a breaking change, released in 3 stages
      • Warn on counters that are not suffixed with _total
      • (next major version) Forcefully add _total to counters. Those that come with _total suffix, remove it and warn, in preparation for next major.
      • (next major version) Error on counters suffixed _total
  • Timestamps Output

  • Exemplars

  • New Metric types

    • StateSet
    • Info
  • Document breaking changes (UPGRADING.md)

    • If you called Metric#values or Stores#all_values, you'll get objects with timestamps instead of simply float
    • Naming convention things: You must pass the unit and not include it in the name. You must not add _total to your counters. We'll add both automatically

Questions:

  1. Is this the most up-to-date documentation we have?

  2. The _created child series:
    a. Is it only supported for counter, summary and histogram?
    b. Is this the time the metric got declared / added to the registry? Or is this the time the metric was first observed?
    c. Do we report one _created value per metric or per time series? In other words, if I have:

    http_server_requests_total{code="200"} 10.0 1590078339.214435
    http_server_requests_total{code="400"} 1.0 1590078339.214435
    

    Should I have:

    http_server_requests_created{code="200"} 1590078336.554018
    http_server_requests_created{code="400"} 1590078336.554018
    

    Or just: http_server_requests_created 1590078336.554018
    And if the former, do we have a separate one per Histogram Bucket?

    Note: in the documentation, all examples happen to not have labels. Moreover, trying to figure this out by experimentation, I think there may be a problem with the Python parser. For Counters, it only accepts the second example (no labels for _created). For Histograms, it only accepts "all the labels minus le". Both kind of make sense independently, but they seem like they should be mutually exclusive.

  3. Exemplars
    a. Are they only available for counter and histogram?
    b. We are reporting an exemplar, of the many the app could have reported. Does it have to be the latest? Can it be an "unstable" pick between the many? (the reason I ask is that if different scrapes are ok to report different exemplars, even if the metric hasn't changed, then we don't need to share those exemplars between processes, and we can just keep them in the Metric object itself, instead of persisting to the files)

@brian-brazil
Copy link

is a breaking change.

Adding UNIT should not be breaking. The only big breaking one is _total for counters.

updated_at timestamp for each data point

I don't see what this has to do with OpenMetrics, and last I checked client_ruby shouldn't need to worry about timestamps at all.

@dmagliola
Copy link
Collaborator Author

Ok, wow, that's an impressively quick answer :)

Adding UNIT should not be breaking. The only big breaking one is _total for counters.

Correct. I'd introduce both together though, since they both imply changing how you name your metrics

I don't see what this has to do with OpenMetrics, and last I checked client_ruby shouldn't need to worry about timestamps at all.

I interpreted this:

# TYPE foo counter
foo_total 17.0 1520879607.789

To be a counter with a value of 17, and a timestamp on when it got updated to that value. Did I misread that?

@dmagliola
Copy link
Collaborator Author

@brian-brazil I added a bunch of questions that came up yesterday, trying to implement OpenMetrics in Ruby.
These may stem from me not using very up-to-date documentation, that document I linked to was all I could find.
Could you let me know what's the answer to those? I'd like to work on preparing for this change late in the next couple of week if possible.

Thank you!

@brian-brazil
Copy link

Correct. I'd introduce both together though, since they both imply changing how you name your metrics

Unit can be handled gracefully, see how the Python client does it.

Did I misread that?

You did, it works the exact same as the Prometheus format. Direct instrumentation should never expose timestamps.

@dmagliola
Copy link
Collaborator Author

You did, it works the exact same as the Prometheus format.

I didn't know about these timestamps on the Prometheus format...
What are they for? Do you have any keywords I can google, or links to documentation about them?

Given this:

  • Should we report the _created series in the Ruby Client, when using OpenMetrics?
  • Should exemplars have a timestamp? Or is this also not something the Ruby Client should expose?

@brian-brazil
Copy link

What are they for?

Federation roughly speaking, I don't believe client_ruby has the features where they'd come up.

Should we report the _created series in the Ruby Client, when using OpenMetrics?

Yes, and in Prometheus format too. The same samples should end up in Prometheus no matter what format is negotiated.

Should exemplars have a timestamp? Or is this also not something the Ruby Client should expose?

If you plan on supporting exemplars, yes.

@dmagliola
Copy link
Collaborator Author

Amazing, thank you!

@Sinjo
Copy link
Member

Sinjo commented Jun 8, 2021

@dmagliola I'm gonna try and finish #220 in the next few days, which will leave us with two breaking changes on master for us to release in 3.0. Are we fine with this being in 4.0? If not I think we need to figure out a separate branch which we can cut non-breaking releases from on 2.x.

@dmagliola
Copy link
Collaborator Author

I'm absolutely fine with this going in later releases. This feature shouldn't block any other progress.

@jeremiahishere
Copy link

Has there been any movement on this in the last year or two? I could use this feature and I am happy to take over.

@dmagliola
Copy link
Collaborator Author

Hello @jeremiahishere!
Glad to hear you're interested in this feature and would have time to contribute to it!

There hasn't been any movement on this, and there are things we are interested in working on first (mainly, exporting performance), so I don't think this will happen soon.

If you'd like to contribute, we'd love to see that! Depending on how involved this would be, and how much it'd affect the rest of the codebase, it might be worth catching up a bit on a super basic draft PR that'd show the general approach, to make sure we're all in the same page, before putting in a lot of work.

@jeremiahishere
Copy link

I posted a sketch pr to the discussion tab: #290
Please take a look when you have some time.

@Sinjo Sinjo added the breaking-change Issues/PRs that break API and require a major version release label Dec 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Issues/PRs that break API and require a major version release
Projects
None yet
Development

No branches or pull requests

4 participants