Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clustermap: Automatic conversion of row/column annotation data to colors #897

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jrderuiter
Copy link
Contributor

In this pull-request I would like to propose an initial approach to automatically convert any given annotation to row/col_colors that can be used in plotting. This would allow us to avoid manually converting annotations to colors before plotting a clustermap (which many people are probably frequently doing).

The current implementation is just a loose example of the code that I have currently been using to convert annotations in a dataframe format to row/col_colors for clustermap. Ideally this type of conversion would be used internally by clustermap to be transparent for the user.

Example with different types of annotation:

import pandas as pd
import numpy as np
import seaborn as sns

from seaborn.matrix import _color_annotation

sns.set_style('white')

np.random.seed(0)

# Generate data.
data = pd.DataFrame(np.random.randn(10, 10),
                    index=['F{}'.format(i) for i in range(10)],
                    columns=['S{}'.format(i) for i in range(10)])

# Generate annotation.
col_ann = pd.DataFrame({'Col 1': (['a'] * 5) + (['b'] * 5),
                        'Col 2': np.random.rand(10),
                        'Col 3': [True, False, True, True, False] * 2},
                       index=data.columns)
# col_ann.ix[1, 'Col 1'] = np.nan

# Color annotation and draw clustermap.
colors = [sns.color_palette(),
          sns.color_palette('Set1')[0],
          sns.color_palette('Set1')[1]]
col_colors, color_maps = _color_annotation(col_ann, colors=colors)

sns.clustermap(data, col_colors=col_colors)

unknown-2

Example of how legends might be added:

from matplotlib import patches as mpatches

def draw_legend(color_map, ax, name, **kwargs):
    """Helper for drawing custom legends."""

    patches = [mpatches.Patch(color=color, label=label)
               for label, color in color_map.items()]
    legend = ax.legend(handles=patches, frameon=True,
                       title=name, **kwargs)

    return legend

# Generate annotation.
col_ann = pd.DataFrame({'Col 1': (['a'] * 5) + (['b'] * 5)}, index=data.columns)

# Color annotation and plot clustermap.
col_colors, color_maps = _color_annotation(col_ann, colors=[sns.color_palette()])
g = sns.clustermap(data, col_colors=col_colors)

for i, (name, color_map) in enumerate(color_maps.items()):
    leg = draw_legend(color_map, ax=g.ax_heatmap, name=name,
                      loc=1, bbox_to_anchor=(1.2, 1 - (0.13 * i)))

unknown-1

I would be interested to hear if there is any interest in integrating this kind of functionality in Seaborn. Critique on the current implementation is also welcome.

As it stands, I think we would at least have to overcome the following issues:

  • Decide how to best pass colors if this is to be used in clustermap. Currently I use a (nested) list of color palettes, with nesting mainly being used to support categorical/string columns which require multiple colors.
  • Determine if row/col_colors are already plottable colors or are data that need to be converted. Any idea on how to reliably determine this?
  • Decide on how to handle numeric values. Currently I am linearly interpolating colors from min/max between a foreground color and a background color. Maybe it would be desirable to allow for more flexibility in the conversion. Alternatively, we could leave more complex cases up the to user to convert themselves.
  • Determine how/where to plot legends. Maybe constraint solving libraries such as cassowary would be an interesting/suitable approach for solving the where issue. This might also allow us to place the colorbar in a better place to address that issue in Clustermap: collapse dendrograms, side_colors and colorbar if not given, make ratios configurable. #891. I have zero experience with cassowary however.

@olgabot
Copy link
Contributor

olgabot commented May 20, 2016

FYI I would be SUPER interested in this. I do these kinds of conversions all the time:

image

This was a particularly hairy one:

image

@mxposed
Copy link

mxposed commented Jul 6, 2023

I also think this would be very helpful, I am doing these color conversions all the time.

My most recent code to do that boiled down to this, maybe it can be helpful:

def get_color_annotations(df, mapping):
    result = []
    for column, palette in mapping.items():
        values = df[column].unique()
        if pd.api.types.is_categorical_dtype(df[column]):
            values = df[column].cat.categories
        lut = dict(zip(values, sns.color_palette(palette, n_colors=values.size).as_hex()))
        result.append(df[column].map(lut))
    return pd.concat(result, axis=1)

with the usage being:

data = sns.load_dataset('diamonds')
data = data.sample(2000, random_state=1066)

sns.clustermap(
    data.select_dtypes('number'),
    z_score=1,
    row_colors=get_color_annotations(data, {
        'cut': 'Reds_r',
        'clarity': 'Blues_r',
        'color': 'Greens_r'
    }),
    method='ward',
    cmap='coolwarm',
    center=0,
    vmax=5,
    vmin=-5
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants