Clustermap: Automatic conversion of row/column annotation data to colors #897

jrderuiter · 2016-03-30T21:05:41Z

In this pull-request I would like to propose an initial approach to automatically convert any given annotation to row/col_colors that can be used in plotting. This would allow us to avoid manually converting annotations to colors before plotting a clustermap (which many people are probably frequently doing).

The current implementation is just a loose example of the code that I have currently been using to convert annotations in a dataframe format to row/col_colors for clustermap. Ideally this type of conversion would be used internally by clustermap to be transparent for the user.

Example with different types of annotation:

import pandas as pd
import numpy as np
import seaborn as sns

from seaborn.matrix import _color_annotation

sns.set_style('white')

np.random.seed(0)

# Generate data.
data = pd.DataFrame(np.random.randn(10, 10),
                    index=['F{}'.format(i) for i in range(10)],
                    columns=['S{}'.format(i) for i in range(10)])

# Generate annotation.
col_ann = pd.DataFrame({'Col 1': (['a'] * 5) + (['b'] * 5),
                        'Col 2': np.random.rand(10),
                        'Col 3': [True, False, True, True, False] * 2},
                       index=data.columns)
# col_ann.ix[1, 'Col 1'] = np.nan

# Color annotation and draw clustermap.
colors = [sns.color_palette(),
          sns.color_palette('Set1')[0],
          sns.color_palette('Set1')[1]]
col_colors, color_maps = _color_annotation(col_ann, colors=colors)

sns.clustermap(data, col_colors=col_colors)

Example of how legends might be added:

from matplotlib import patches as mpatches

def draw_legend(color_map, ax, name, **kwargs):
    """Helper for drawing custom legends."""

    patches = [mpatches.Patch(color=color, label=label)
               for label, color in color_map.items()]
    legend = ax.legend(handles=patches, frameon=True,
                       title=name, **kwargs)

    return legend

# Generate annotation.
col_ann = pd.DataFrame({'Col 1': (['a'] * 5) + (['b'] * 5)}, index=data.columns)

# Color annotation and plot clustermap.
col_colors, color_maps = _color_annotation(col_ann, colors=[sns.color_palette()])
g = sns.clustermap(data, col_colors=col_colors)

for i, (name, color_map) in enumerate(color_maps.items()):
    leg = draw_legend(color_map, ax=g.ax_heatmap, name=name,
                      loc=1, bbox_to_anchor=(1.2, 1 - (0.13 * i)))

I would be interested to hear if there is any interest in integrating this kind of functionality in Seaborn. Critique on the current implementation is also welcome.

As it stands, I think we would at least have to overcome the following issues:

Decide how to best pass colors if this is to be used in clustermap. Currently I use a (nested) list of color palettes, with nesting mainly being used to support categorical/string columns which require multiple colors.
Determine if row/col_colors are already plottable colors or are data that need to be converted. Any idea on how to reliably determine this?
Decide on how to handle numeric values. Currently I am linearly interpolating colors from min/max between a foreground color and a background color. Maybe it would be desirable to allow for more flexibility in the conversion. Alternatively, we could leave more complex cases up the to user to convert themselves.
Determine how/where to plot legends. Maybe constraint solving libraries such as cassowary would be an interesting/suitable approach for solving the where issue. This might also allow us to place the colorbar in a better place to address that issue in Clustermap: collapse dendrograms, side_colors and colorbar if not given, make ratios configurable. #891. I have zero experience with cassowary however.

olgabot · 2016-05-20T18:58:13Z

FYI I would be SUPER interested in this. I do these kinds of conversions all the time:

This was a particularly hairy one:

mxposed · 2023-07-06T18:52:34Z

I also think this would be very helpful, I am doing these color conversions all the time.

My most recent code to do that boiled down to this, maybe it can be helpful:

def get_color_annotations(df, mapping):
    result = []
    for column, palette in mapping.items():
        values = df[column].unique()
        if pd.api.types.is_categorical_dtype(df[column]):
            values = df[column].cat.categories
        lut = dict(zip(values, sns.color_palette(palette, n_colors=values.size).as_hex()))
        result.append(df[column].map(lut))
    return pd.concat(result, axis=1)

with the usage being:

data = sns.load_dataset('diamonds')
data = data.sample(2000, random_state=1066)

sns.clustermap(
    data.select_dtypes('number'),
    z_score=1,
    row_colors=get_color_annotations(data, {
        'cut': 'Reds_r',
        'clarity': 'Blues_r',
        'color': 'Greens_r'
    }),
    method='ward',
    cmap='coolwarm',
    center=0,
    vmax=5,
    vmin=-5
)

Initial example of color conversion code.

0e137ff

mwaskom added enhancement mod:matrix labels May 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clustermap: Automatic conversion of row/column annotation data to colors #897

Clustermap: Automatic conversion of row/column annotation data to colors #897

jrderuiter commented Mar 30, 2016

olgabot commented May 20, 2016

mxposed commented Jul 6, 2023

Clustermap: Automatic conversion of row/column annotation data to colors #897

Are you sure you want to change the base?

Clustermap: Automatic conversion of row/column annotation data to colors #897

Conversation

jrderuiter commented Mar 30, 2016

olgabot commented May 20, 2016

mxposed commented Jul 6, 2023