Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Palette does not support the use of defaultdict with missing values #3632

Open
ehermes opened this issue Feb 7, 2024 · 4 comments
Open

Palette does not support the use of defaultdict with missing values #3632

ehermes opened this issue Feb 7, 2024 · 4 comments

Comments

@ehermes
Copy link

ehermes commented Feb 7, 2024

Currently, Seaborn does not permit the use of defaultdict with missing values as a palette. A minimal example that reproduces this issue is:

import seaborn as sns
import pandas as pd
from collections import defaultdict

data = pd.DataFrame({
    "values": [1, 2, 3],
    "hues": ["foo", "bar", "baz"],
})

palette = defaultdict(lambda: "#000000", {
    "foo": "#ff0000",
    "bar": "#00ff00",
})

sns.histplot(
    x="values",
    data=data,
    hue="hues",
    palette=palette,
)

My expectation is that this should use the default value of #000000 for baz, which is missing from the palette. Instead, this raises an exception:

Traceback (most recent call last):
  File "/home/ehermes/test/seaborn_defaultdict.py", line 15, in <module>
    sns.histplot(
  File "/home/ehermes/venvs/seaborn/lib/python3.10/site-packages/seaborn/distributions.py", line 1384, in histplot
    p.map_hue(palette=palette, order=hue_order, norm=hue_norm)
  File "/home/ehermes/venvs/seaborn/lib/python3.10/site-packages/seaborn/_base.py", line 838, in map_hue
    mapping = HueMapping(self, palette, order, norm, saturation)
  File "/home/ehermes/venvs/seaborn/lib/python3.10/site-packages/seaborn/_base.py", line 150, in __init__
    levels, lookup_table = self.categorical_mapping(
  File "/home/ehermes/venvs/seaborn/lib/python3.10/site-packages/seaborn/_base.py", line 234, in categorical_mapping
    raise ValueError(err.format(missing))
ValueError: The palette dictionary is missing keys: {'baz'}

For this test, I have used seaborn-0.13.2 and matplotlib-3.8.2.

I have a fix for this problem in a personal branch (https://github.com/ehermes/seaborn/tree/palette_defaultdict), but per your contribution guidelines, I have opened a bug report first. With permission, I can also create a PR for my fix.

@mwaskom
Copy link
Owner

mwaskom commented Feb 10, 2024

defaultdict is a nice pythonic solution here, but the type signature for palette is already quite complicated and i'm fairly averse to expanding it further. I'm not also not convinced that setting up the defaultdict is that much more convenient than defining a full dict palette based on the data, e.g. something like

palette = {
    *{x: "k" for x in data["hues"].unique()},
    "foo": "#ff0000",
    "bar": "#00ff00",
}

Is the same LoC and avoids an import.

@ehermes
Copy link
Author

ehermes commented Feb 10, 2024

This is a good solution if you have the data that you will be plotting when you are first creating the palette. In our application, the palette is "statically" defined in a library, and the data we plot is generated at runtime. Sometimes the data contains entries that we did not expect to be present at the time we wrote the library, so we need to have a backup value present. My current workaround to this issue is to essentially do what you're suggesting, but I have to do it in every single function that creates a seaborn plot, which is a lot of redundant code. We could possibly simplify things through a code re-org, but my preference would be for seaborn to use the defaultdict that we have chosen for this exact reason in the expected manner.

@mwaskom
Copy link
Owner

mwaskom commented Feb 10, 2024

Why say “in this expected manner”? Defaultdict is not a subtype of dict and seaborn’s docs don’t suggest that it will be accepted.

@ehermes
Copy link
Author

ehermes commented Feb 10, 2024

Strictly speaking, defaultdict is a subtype of dict:

In [1]: from collections import defaultdict

In [2]: palette = defaultdict(lambda: "#000000", {
   ...:     "foo": "#ff0000",
   ...:     "bar": "#00ff00",
   ...: })

In [3]: isinstance(palette, dict)
Out[3]: True

When I say "in the expected manner", I mean from the "duck typing" perspective: a defaultdict behaves like a dict, and thus should be suitable for any application in which a dict is accepted. The only reason we cannot use a defaultdict as the palette for seaborn is because of an extra check that every level has a corresponding key in it, which may not be true for non-primitive dict-likes. Actually, this brings to mind an alternative possible solution, which doesn't specifically require reference to defaultdict:

if isinstance(palette, dict):
    missing = set()
    for level in levels:
        try:
            palette[level]
        except KeyError:
            missing.add(level)
    if any(missing):
        err = "The palette dictionary is missing keys: {}"
        raise ValueError(err.format(missing))

Edit: Removed non-functional alternate suggestions (apparently defaultdict.get doesn't behave the way I thought it did)

In any case, my point is that the current check is preventing us from using something as the palette which we would otherwise be able to, and which we currently do use for our other non-matplotlib plots (namely plotly). The changes I have suggested here would add more flexibility to the code without impacting the functionality of the missing key check, when users are passing a standard dict.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants