Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] - DataTable - Conditional Formatting #13682

Closed
thomktz opened this issue Feb 6, 2024 · 12 comments · Fixed by #13685
Closed

[FEATURE] - DataTable - Conditional Formatting #13682

thomktz opened this issue Feb 6, 2024 · 12 comments · Fixed by #13685

Comments

@thomktz
Copy link
Member

thomktz commented Feb 6, 2024

Problem description

I think there should be better ways to do client-side and server-side conditional formatting in DataTables.
Currently, we have to use the raw HTML formatter, either specifying JS rules like this, or by specifying a column from the CDS as color like this, which works.

template = f"""
    <div style="background: <%= {bg_color_column}%>;
            color: {text_color_column}">
        <%= {valueformatting} %>
    </div>
"""
formatter = HTMLTemplateFormatter(template=template)

Feature description

The most straightforward approach for me is to add a new Formatter where you can specify which column to use as colors from the CDS. I did this on a branch here.

Here's how it works:

### Imports and Data Generation
import numpy as np
import pandas as pd
from bokeh.models import DataTable, DynamicFormatter
from bokeh.io import show

np.random.seed(42)

# Custom colors in Hex
EUROPE = "#bdfca2"
AMERICA = "#a2c4fc"
GREY = "#d8dfe6"

# Custom colors in RGB
red_color = (245, 112, 110)
green_color = (182, 232, 144)

def interpolate_color(color1, color2, t):
    """Interpolates between two RGB colors."""
    return tuple(int(a + (b - a) * t) for a, b in zip(color1, color2))

def cmap(value):
    """Gets a color from a value between 0 and 1."""
    # Determine which colors to interpolate based on value
    if value <= 0.5:
        interpolated_color = interpolate_color(red_color, (255, 255, 0), value * 2)
    else:
        interpolated_color = interpolate_color((255, 255, 0), green_color, (value - 0.5) * 2)
    # Convert RGB to Hex
    return f"#{interpolated_color[0]:02x}{interpolated_color[1]:02x}{interpolated_color[2]:02x}"


# Sample size
sample_size = 30

# Data generation
dataframe = pd.DataFrame({
    "Name": np.random.choice(["John", "Anna", "Peter", "Linda", "Thomas", "George", "Marie", "Linda"], size=sample_size),
    "Age": np.random.randint(13, 61, size=sample_size),
    "Location": np.random.choice(["New York", "Paris", "Berlin", "London", "Stockholm", "Madrid", "Washington"], size=sample_size),
    "Height": np.round(np.random.uniform(1.5, 2.0, size=sample_size), 2),
})


### DataTable with DynamicFormatters

# Solid grey color for names column
name_format = DynamicFormatter(background_color=GREY)

# Indicate if the person is over 18 years old, and set the text red if they are not
over_18 = np.where(dataframe["Age"] < 18, "red", None)
age_format = DynamicFormatter(text_color="age_color")

# Set the background color of the "Location" column based on the "Continent" column
continent = np.where(dataframe["Location"].isin(["New York", "Washington"]), AMERICA, EUROPE)
location_format = DynamicFormatter(background_color="location_color")

# Set the color of the "Height" column on a continuous scale
norm_height = (dataframe["Height"] - dataframe["Height"].min()) / (dataframe["Height"].max() - dataframe["Height"].min())
height_color = norm_height.apply(cmap)
height_format = DynamicFormatter(background_color="height_color")

# Create the DataTable
data_table = DataTable.from_data(dataframe, include_pandas_index=False, height=800)

# Add the new source columns for the formatters
data_table.source.data["age_color"] = over_18
data_table.source.data["location_color"] = continent
data_table.source.data["height_color"] = height_color

# Apply the formatters
data_table.columns[0].formatter = name_format
data_table.columns[1].formatter = age_format
data_table.columns[2].formatter = location_format
data_table.columns[3].formatter = height_format

# Show the DataTable
show(data_table)

Output:

The code shows 3 different uses of DynamicFormatter.
1 - Define a unique solid background color for the name
2 - Define a boolean rule to format the text of "Age" red, based on the value of the "Age" column itself
3 - Define a more advanced filter based on another series, to format the locations.
4 - Format the height using a continuous color-scale

image

Sorted by height:

image

Potential alternatives

We could also do more of that server-side. Create

  • BooleanConditionalFormatter(value_col, color_true, color_false)
  • Colormap(value_col, cmap, scaling_mode="minmax")

etc.

Additional information

Note: My branch also introduces some changes to the recently merged DataTable.from_data method.

  • include_pandas_indexAbility to not include the DF index as a column of the CDS
  • Removed the index_position=None, I don't think we need to enforce this. We can always use the kwargs if we want
@thomktz thomktz added the TRIAGE label Feb 6, 2024
@thomktz thomktz changed the title [DataTable - Conditional Formatting] [FEATURE] - DataTable - Conditional Formatting Feb 6, 2024
@bryevdv
Copy link
Member

bryevdv commented Feb 6, 2024

Hi @thomktz Thanks for the issue and also the PR—it's always great to see someone dive right in to BokehJS dev. I do think we need to take a beat to consider design goals before diving into a particular solution. In particular, Bokeh already has a facility ("DataSpecs") for mediating between single values vs a CDS column of values, and also for applying transforms and expressions. These DataSpecs are already used extensively for glyphs and annotations and other things. I think it would be much better if we can re-use this familiar machinery to leverage known concepts, instead of adding an entirely new class.

I agree this is a very valuable feature (as you demonstrate in the images above). But I think it could look like this:

StringFormatter(text_color="red")             # scalar value, written explicitly with value(...)
NuimberFormatter(text_color="a_color_column") # column values, written explicitly with field(...)
DateFormatter(text_color=linear_cmap(...))    # colormapped values, also factor_cmap, etc. 

and similarly for a background_color and possibly other properties (e.g. other font configs).

I have not looked into it in any detail, but I believe that this should be achievable. If it is achievable, I definitely think it is is the ideal API for users that also keeps new maintenance and docs burdens to a minimum. Given that, I think it needs to be at least investigated for feasibility before anything else.

cc @bokeh/dev

@bryevdv bryevdv added this to the 3.x milestone Feb 6, 2024
@thomktz
Copy link
Member Author

thomktz commented Feb 6, 2024

Thanks for the feedback @bryevdv, I agree! This first iteration was more about me getting familiar with TypeScript and the bokeh codebase. I'll look into DataSpecs. At the very bottom, which component do you think should handle the colouring ? CellFormatter ?

@bryevdv
Copy link
Member

bryevdv commented Feb 6, 2024

@thomktz CellFormatter includes HTMLFormatter which probably shouldn't have these properties (since it exists mostly to bypass all of them). I think StringFormatter probably a good choice, and it's a base class for most (all?) of the other formatters.

Edit: I'll add, I'm a few years out from regular hands-on coding with BokehJS, hopefully @mattpap or @ianthomas23 can add any relevant pointers or thoughts

@mattpap
Copy link
Contributor

mattpap commented Feb 7, 2024

Cell formatters (and editors for that matter) were originally implemented very crudely and never refined, so I don't have any expectations about the design there. We are also in the process of migrating to SlickGrid 5 in PR #13522, which may require some redesign anyway.

@thomktz
Copy link
Member Author

thomktz commented Feb 9, 2024

I had a go with ColorSpecs in #13685, and did have to add a few new classes in Typescript, but it can now work like this:

import numpy as np
import pandas as pd

from bokeh.io import save
from bokeh.models import DataTable, StringFormatter

# Sample size
sample_size = 30

# Data generation
dataframe = pd.DataFrame({
    "text_color_col": np.random.choice(["red", "green", "blue", "black", "grey"], size=sample_size),
    "other_text_color_col": ["some text"] * sample_size,
    "background_color_col": np.random.choice(["red", "green", "blue", "yellow"], size=sample_size),
    "other_background_color_col": ["some text"] * sample_size,
    "font_style_col": np.random.choice(["normal", "italic", "bold", "bold italic"], size=sample_size),
    "other_font_style_col": ["some text"] * sample_size,
    "text_align_col": np.random.choice(["left", "center", "right"], size=sample_size),
    "other_text_align_col": ["some text"] * sample_size,
})

formatters = [
    StringFormatter(text_color="text_color_col"),
    StringFormatter(text_color="blue"),
    StringFormatter(background_color="background_color_col"),
    StringFormatter(background_color="yellow"),
    StringFormatter(font_style="font_style_col"),
    StringFormatter(font_style="bold"),
    StringFormatter(text_align="text_align_col"),
    StringFormatter(text_align="center"),
]

datatable = DataTable.from_data(dataframe)
for i in range(len(formatters)):
    datatable.columns[i+1].formatter = formatters[i]

save(datatable)
image

@bryevdv
Copy link
Member

bryevdv commented Feb 9, 2024

@thomktz that code looks great, exactly like what I'd expect users to be able to do

@thomktz
Copy link
Member Author

thomktz commented Feb 11, 2024

Thanks @bryevdv! I changed the code in the PR (#13685) to use the existing DataSpecs. I also got the colormaps to work:

from datetime import datetime

import numpy as np
import pandas as pd

from bokeh.io import save
from bokeh.models import (ColumnDataSource, DataTable, DateFormatter,
                          NumberFormatter, TableColumn)
from bokeh.palettes import RdYlGn9
from bokeh.transform import factor_cmap, linear_cmap

np.random.seed(1)

sample_size = 10

mean, std, K_std = 1e6, 3e5, 2

data = dict(
    dates=pd.date_range(start=datetime.now().date(), periods=sample_size).tolist(),
    downloads=np.random.normal(mean, std, sample_size),
)

data["is_weekend"] = [str(date.weekday() >= 5) for date in data["dates"]]
data["weekend_bold"] = ["bold" if date.weekday() >= 5 else "normal" for date in data["dates"]]

table = DataTable(
    source=ColumnDataSource(data),
    columns=[
        TableColumn(
            field="dates",
            title="Date",
            formatter=DateFormatter(
                format="%A, %b %-d, %Y",
                font_style="weekend_bold",
                background_color=factor_cmap(
                    field_name="is_weekend",
                    palette=["lightgrey", "white"],
                    factors=["True", "False"],
                ),
            ),
        ),
        TableColumn(
            field="downloads",
            title="Downloads",
            formatter=NumberFormatter(
                format="0.0a",
                font_style="weekend_bold",
                background_color=linear_cmap(
                    field_name="downloads",
                    palette=RdYlGn9[::-1],
                    low=mean - K_std*std,
                    high=mean + K_std*std,
                ),
            ),
        ),
    ],
)

save(table)
image

@bryevdv bryevdv modified the milestones: 3.x, 3.4 Feb 11, 2024
@bryevdv
Copy link
Member

bryevdv commented Feb 11, 2024

@bokeh/dev it occurs to me that there may also be older issues around this or similar features, is anyone aware fo any offhand that we can link or close in favor of this one?

@conchoecia
Copy link

@thomktz - is any of the final code for how to color cells documented anywhere aside from the API? Like a general tutorial or something. I'm poking around with StringFormatter now!

@bryevdv
Copy link
Member

bryevdv commented Apr 24, 2024

@conchoecia The only documentation at present is the change of property types to ColorSpec which indicates that the property can be attached to a column in a CDS, used with transforms and mappers, etc. This change shows up in the reference guide.

The code above would be really nice to have added as an example, with a thumbnail in the gallery, and making that addtion would be a great task for a new contributor. With an example added, it would also be easier to make some updates or additions that reference it, to the User's Guide. I'm happy to help advise anyone who would like to work on this.

@thomktz
Copy link
Member Author

thomktz commented Apr 24, 2024

Hi @conchoecia, @bryevdv, happy to make a complete example in the gallery - in the meantime, you can see this in action in

  • examples/interaction/widgets/data_table_formatting.py
  • examples/integration/widgets/data_table_customization.py
  • examples/interaction/tools/point_draw.py

@conchoecia
Copy link

conchoecia commented Apr 25, 2024

Thanks for your responses, @bryevdv and @thomktz. I will defer to @thomktz for making contributions to the gallery since I'm just starting out with this package and it was his PR!

I see examples/interaction/widgets/data_table_formatting.py in the main branch, but it looks like examples/integration/widgets/data_table_customization.py is only in the PR, not the main branch.

I figured out that the feature that I wanted was this, where I could specify the cell color by the color specified in another column of a ColumnDataSource:

    TableColumn(field='name', title='Name',.
                formatter=StringFormatter(background_color="CPK", text_color="text_color")),
Screen Shot 2024-04-25 at 2 08 00 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants