[DOCUMENTATION] Boxplot example wrongly computes whiskers #13779

its-DomeE · 2024-03-25T07:00:59Z

Software versions

Python version : 3.8.17 (default, Aug 10 2023, 12:50:17)
IPython version : 8.12.3
Tornado version : 6.4
Bokeh version : 3.1.1

Browser name and version

No response

Jupyter notebook / Jupyter Lab version

No response

Expected behavior

The boxplot example of the documentation in examples/topics/stats/boxplot.py should compute the whiskers by:

finding the maximum value in between 75% quantile and 75% +1.5 * IQR and
finding the minimum value in between 25% quantile - 1.5 * IQR and 25% quantile

Observed behavior

The whiskers are computed in the example by just calculating:

75% quantile + 1.5 * IQR and
25% quantile - 1.5 * IQR

Which leads to the whiskers not providing any additional information at all.

Example code

import pandas as pd

from bokeh.models import ColumnDataSource, Whisker
from bokeh.plotting import figure, show
from bokeh.sampledata.autompg2 import autompg2
from bokeh.transform import factor_cmap

df = autompg2[["class", "hwy"]].rename(columns={"class": "kind"})

kinds = df.kind.unique()

# compute quantiles
qs = df.groupby("kind").hwy.quantile([0.25, 0.5, 0.75])
qs = qs.unstack().reset_index()
qs.columns = ["kind", "q1", "q2", "q3"]
df = pd.merge(df, qs, on="kind", how="left")

# compute IQR outlier bounds
iqr = df.q3 - df.q1
df["upper"] = df.q3 + 1.5*iqr
df["lower"] = df.q1 - 1.5*iqr

source = ColumnDataSource(df)

p = figure(x_range=kinds, tools="", toolbar_location=None,
           title="Highway MPG distribution by vehicle class",
           background_fill_color="#eaefef", y_axis_label="MPG")

# outlier range
whisker = Whisker(base="kind", upper="upper", lower="lower", source=source)
whisker.upper_head.size = whisker.lower_head.size = 20
p.add_layout(whisker)

# quantile boxes
cmap = factor_cmap("kind", "TolRainbow7", kinds)
p.vbar("kind", 0.7, "q2", "q3", source=source, color=cmap, line_color="black")
p.vbar("kind", 0.7, "q1", "q2", source=source, color=cmap, line_color="black")

# outliers
outliers = df[~df.hwy.between(df.lower, df.upper)]
p.scatter("kind", "hwy", source=outliers, size=6, color="black", alpha=0.3)

p.xgrid.grid_line_color = None
p.axis.major_label_text_font_size="14px"
p.axis.axis_label_text_font_size="12px"

show(p)

Stack traceback or browser console output

No response

Screenshots

No response

its-DomeE · 2024-03-25T07:02:32Z

I could provide a PR in the next days if desired.

dinya · 2024-04-03T07:32:39Z

@its-DomeE do you mean the wiskers difference like

Current boxplot example	[McGill1978] approach

I'm using the backported (adapted) code from matplotlib.cbook.boxplot_stats() (code) in my lib. The function itself uses the [McGill1978] approach.

(This code is used by seaborn.boxplot() too as far as seaborn is "high-level frontend" for matplotlib).

[McGill1978] McGill, R., Tukey, J.W., and Larsen, W.A. (1978) "Variations of Boxplots", The American Statistician, 32:12-16.

its-DomeE added the TRIAGE label Mar 25, 2024

bryevdv added type: task tag: component: examples and removed TRIAGE labels Apr 1, 2024

bryevdv added this to the 3.x milestone Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOCUMENTATION] Boxplot example wrongly computes whiskers #13779

[DOCUMENTATION] Boxplot example wrongly computes whiskers #13779

its-DomeE commented Mar 25, 2024

its-DomeE commented Mar 25, 2024

dinya commented Apr 3, 2024 •

edited

[DOCUMENTATION] Boxplot example wrongly computes whiskers #13779

[DOCUMENTATION] Boxplot example wrongly computes whiskers #13779

Comments

its-DomeE commented Mar 25, 2024

Software versions

Browser name and version

Jupyter notebook / Jupyter Lab version

Expected behavior

Observed behavior

Example code

Stack traceback or browser console output

Screenshots

its-DomeE commented Mar 25, 2024

dinya commented Apr 3, 2024 • edited

dinya commented Apr 3, 2024 •

edited