statistics: 'all' for measurements #247

Rvh91 · 2023-02-07T19:51:39Z

Describe the solution you'd like
Inspired by the example given by r-jean-pierre.
It would be great if it was possible to add an additional option for statistics for measurements, where we can select 'all' instead of 'min', 'max', 'mean'. This would then display for each entity a single line for the mean values, and a shaded area that covers the range from min to max. This type of plot is essentially what is also used by the sensor plots in home assistant, where the shaded area is used to represent the range in which the measurements are made.

some considerations:

The 'mean' line is treated as the 'base', meaning that for example 'show_value' is enabled it only shows it for the mean line by default. Also in the legend only the 'mean' line is showed, and if it is toggled, the shaded area associated with the min/max is also toggled.
The color of the shaded area would follow that of the color from the 'mean' line, and ideally have some opacity or shaded slightly lighter. The lines for min and max should be thin and slightly darker than the shaded fill area.
If for example a 'spline' is specified for the line associated to this entity, this specification would also be inherited by the min and max lines.

How would it be defined in yaml?

entities:
  - entity: sensor.temperature
    statistic: all # `min`, `mean` `max` of 'all'

Scribble
for example as in example given by r-jean-pierre:

Additional context
Add any other context or screenshots about the feature request here.

dbuezas · 2023-02-07T19:58:08Z

Interesting yaml proposal, this is quite different from anything we've got because a single entity would generate 3 traces. I'll give it a shot when I have some time. It may get messy when configuring it, particularly for filters, but maybe it works ok. Configs would be applied equally to all traces, except maybe for show_value and the fill color.

The only issue I have with this is that it goes against a general principle I'm trying to maintain, which is to expose plotly more or less as directly as possible.

If any other user reads this, please do drop some opinion and/or ideas.

dbuezas · 2023-02-07T19:59:20Z

Oh, one extra consideration:
Some statistics don't have min/max/mean, but have sum and state instead. This may generate some confusion.

Rvh91 · 2023-02-07T20:58:29Z

Interesting yaml proposal, this is quite different from anything we've got because a single entity would generate 3 traces. I'll give it a shot when I have some time. It may get messy when configuring it, particularly for filters, but maybe it works ok. Configs would be applied equally to all traces, except maybe for show_value and the fill color.

The only issue I have with this is that it goes against a general principle I'm trying to maintain, which is to expose plotly more or less as directly as possible.

If any other user reads this, please do drop some opinion and/or ideas.

I can imagine that if it gets to messy it might become to convoluted indeed. I have not played around yet with filters, so it's hard for me to grasp where exactly it would fall apart. If we do filters like offsets or moving averages for example, those would be applied to the values contained in the long term statistics I guess? As a first start I would indeed apply those to the three traces equally. I don't directly see an application where you would want to filter the statistics, but that is probably my lack of imagination.

Oh, one extra consideration: Some statistics don't have min/max/mean, but have sum and state instead. This may generate some confusion.

As far as I understand the sum and state would be for the statistics in the case of state_class=total, wheras my proposal would only apply to those where state_class = measurement. So for those entities with state_class = total, this 'all' option would not apply. Out of curiosity, what happens if you set statistics: mean, for an entity with state_class = total? Perhaps we can handle the new case in a similar way?

So Essentially in the doc we would get:

for entities with state_class=measurement (normal sensors, like temperature)

type: custom:plotly-graph
entities:
  - entity: sensor.temperature
    statistic: max # `min`, `mean` `max` of 'all'
    period: 5minute # `5minute`, `hour`, `day`, `week`, `month`, `auto` # `auto` varies the period depending on the zoom level

and this would remain the same:
for entities with state_class=total (such as utility meters)

type: custom:plotly-graph
entities:
  - entity: sensor.temperature
    statistic: state # `state` or `sum`

r-jean-pierre · 2023-02-07T21:07:19Z

Hi guys!

I vote +1 for any improvement reducing the number of lines I had to wrote :-)

If you want something challenging and more general, I have something to "propose": in my own perfect world of numpy-pandas-seaborn, I spend my entire day of grouping data per "something", and then displaying a trend inside an "error" band. For example, when you helped me here: #229 my ultimate dream is to group by day of week, plot the median as a main trend and display for example the 10 - 90 percentiles

A simple

sns.lineplot(data=data, x="day", y="y", 
             estimator=lambda x: np.median(x),
             errorbar=lambda x: (np.percentile(x, 10), np.percentile(x, 90)))

Already gives

which is, at least for me, beautiful enough while having tiny code to maintain and still open to custom functions to really focus on different statistics rather than tweaking pixels...)
So if you have plans like this, I will add a +1 also!

Note that I grouped on a "fake" categorical variable which is the day of week, but in general I do the same on rolling window over timestamp which I suppose is more interesting for HA community

r-jean-pierre · 2023-02-07T21:21:42Z

Interesting yaml proposal, this is quite different from anything we've got because a single entity would generate 3 traces. I'll give it a shot when I have some time. It may get messy when configuring it, particularly for filters, but maybe it works ok. Configs would be applied equally to all traces, except maybe for show_value and the fill color.

The only issue I have with this is that it goes against a general principle I'm trying to maintain, which is to expose plotly more or less as directly as possible.

If any other user reads this, please do drop some opinion and/or ideas.

I think it won't be a piece of cake, I far I know, there no one-line-method to solve it in plotly, the closest is this: https://plotly.com/python/continuous-error-bars/ which forces the developer to use plotly.graph_objs and not plotly.express, and in general the developer has also to precompute the statistics before with pandas for example for python guys
Unless I missed the perfect solution, it's also quite a headache to handle missing values, maintain the legend, smooth the lines, etc.

dbuezas · 2023-02-08T20:04:54Z

there no one-line-method to solve it in plotly

Yeah, I'm not aware of any plotly feature to do this either. What I have in mind is to generate 3 independent traces and then style them via fill: tonexty (like in the link you posted or your original plot actually, the one with the popular screenshot :) )

dbuezas · 2023-02-08T20:13:52Z

my ultimate dream is to group by day of week, plot the median as a main trend and display for example the 10 - 90 percentiles

This sounds kind of similar to the heatmap example here: #215

I could could potentially add 2 filters to compute the lower and upper limits of a parametrisable percentile. Something like:

- entity: sensor.temperature
   filters:
    - error_band: # <-- how would you name this?
         window_size: 10
         percentile: 10 #<-- and this?
- entity: sensor.temperature
   filters:
    - error_band:
         window_size: 10
         percentile: 90
- entity: sensor.temperature
   filters:
    - error_band:
         window_size: 10
         percentile: 50 # <-- which would equal the mean

This is kind of different to what this feature request aims at, but would this help with your dream @r-jean-pierre ?

r-jean-pierre · 2023-02-08T21:02:01Z

Difficult to reply as there is no perfect solution... To my mind HA/plotly-card is not supposed to be a framework for advanced analytics. I would even not allow to choose different sensors.

- entity: sensor.temperature
   - filters:
     - band:
       - window_size: 10 
       - estimator: bydefault_mean (or in a choice of functions you predefined like ["mean", "median", etc.]
       - errorbar: either a string with functions you already defined like "sd" -> (-std, +std), "3sd" -> (-3*std, 3*std)
         # if not, a string that gives the 2 boundaries of the band, then min and max must be defined both by the user
         - min: string in a set of predefined functions (percentile_xx, min, etc.)
         - max: string in a set of predefined functions (percentile_xx, max, etc.)

so your example become:

- entity: sensor.temperature
   - filters:
     - band:
       - window_size: 10 
       - estimator: "median"
       - errorbar:
         - min: "percentile_10"
         - max: "percentile_90"

if you go too far, as you say, it will be as complicated as the code I did (for the end-user), and you will have in any case 5* more code behind the scene to execute it

eventually what could be kind is to used some alpha to control the errorbar fill color. As the estimator is rendered in a plotly line, we could have something like:

- entity: sensor.temperature
   - filters:
     - band:
       - window_size: 10 
       - estimator: "median"
         color: rgb(255, 0, 0)
       - errorbar:
         - min: "percentile_10"
         - max: "percentile_90"
         alpha: 0.75
         #behind the scene something will take the rgb(255, 0, 0) of the estimator and will do a fillcolor with rgb(255, 0, 0, 0.75)

The minimalistic example with color tuning would be:

- entity: sensor.temperature
   - filters:
     - band:
       - window_size: 10 
       - estimator:
         color: rgb(255, 0, 0)
       - errorbar:
         alpha: 0.75

Which translates to "please average the sensor.temperature over 10 rolling points in solid blue, and display it within its rolling standard deviation in a translucide blue band"

Honestly the difficulty was to understand how to play with fill: tonexty
What is still annoying is to maintain 3 times the almost same code to display myself in yaml the 3 traces of min-mean-max

I'm too by influenced by https://seaborn.pydata.org/generated/seaborn.lineplot.html to think differently , hope some other people will propose something more clever

dbuezas · 2023-02-08T21:13:14Z

That would be cool, the issue is that filters can only modify the arrays that are passed to a single trace. In your example, the usage of the filter would result in 2 other traces being created later as a side effect.

It can be done, but it does break the cleaner and easier code architecture I introduced in v3. I'll have to add a special internal flag to let a later part of the code know it should do something unusual. If you code you'll know why I'm hesitant 🤔.

Let's keep the discussion open for a couple of days, maybe we come up with something that is elegant for the user (like your solution, and the original suggestion in this FR), but also for the code (which in practical terms usually results in less future issues)

rrozema · 2024-05-09T19:14:05Z

Oh, one extra consideration: Some statistics don't have min/max/mean, but have sum and state instead. This may generate some confusion.

The statistics_meta table can be used to make this distinction: statistics having valid min, max, avg values have has_mean = 1 in the statistics_meta table, those that have state and sum values in them have has_sum = 1.

It does make sense to distinguish between these two different types of statistics anyway, as -unlike with "has_mean statistics"- the values in the sum column should not be shown directly, the [sum] column is in fact a running total of all values entered. i.e. instead of showing the value read from the [sum] column, it would make much more sense to read the value from 2 rows, one at the end of each period plus another at the beginning of said period, then subtract their [sum] values, to show the 'change' over this period.

The value in the [state] column represents something like the "meter read-out" at the specific point in time, the value in the [sum] column is the running total of all actual consumption/production since the measurements began up to the specific point in time.

Rvh91 added the enhancement New feature or request label Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

statistics: 'all' for measurements #247

statistics: 'all' for measurements #247

Rvh91 commented Feb 7, 2023

dbuezas commented Feb 7, 2023

dbuezas commented Feb 7, 2023

Rvh91 commented Feb 7, 2023

r-jean-pierre commented Feb 7, 2023

r-jean-pierre commented Feb 7, 2023

dbuezas commented Feb 8, 2023

dbuezas commented Feb 8, 2023

r-jean-pierre commented Feb 8, 2023 •

edited

dbuezas commented Feb 8, 2023

rrozema commented May 9, 2024 •

edited

statistics: 'all' for measurements #247

statistics: 'all' for measurements #247

Comments

Rvh91 commented Feb 7, 2023

dbuezas commented Feb 7, 2023

dbuezas commented Feb 7, 2023

Rvh91 commented Feb 7, 2023

r-jean-pierre commented Feb 7, 2023

r-jean-pierre commented Feb 7, 2023

dbuezas commented Feb 8, 2023

dbuezas commented Feb 8, 2023

r-jean-pierre commented Feb 8, 2023 • edited

dbuezas commented Feb 8, 2023

rrozema commented May 9, 2024 • edited

r-jean-pierre commented Feb 8, 2023 •

edited

rrozema commented May 9, 2024 •

edited