Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoder for MultiIndexes fails if there are other variables, using a dimension which is part of the multiindex #461

Open
okz opened this issue Jul 30, 2023 · 0 comments

Comments

@okz
Copy link

okz commented Jul 30, 2023

First, thank you so much. Compression-by-gathering is an incredibly usefull addition, which hopefully will end up in xarray for ragged (or sparse) array support on netcdf's. one day.

#321 added support encoding and decoding for Pandas multi-indexes using "compression by gathering". However if there are other variables in the dataset using a dimension which is part of the multiindex, decode fails.

Minimum example, is a single line addition of var_with_lat , derived from the Encoding and decoding tutorial:

ds = xr.Dataset(
    {"landsoilt": ("landpoint", np.random.randn(4), {"foo": "bar"})},
    {
        "landpoint": pd.MultiIndex.from_product(
            [["a", "b"], [1, 2]], names=("lat", "lon")
        )
    },
)

# ADDING THIS LINE WILL FAIL THE DECODING PROCESS. 
# ds["var_with_lat"] = xr.DataArray([1,2], dims="lat")

encoded = cfxr.encode_multi_index_as_compress(ds, "landpoint")
decoded = cfxr.decode_compress_to_multi_index(encoded, "landpoint")

Once var_with_lat is added, decoding fails:

---> [129](file:///home/mirico/git/Curvefit/tests/scratch%20copy.py?line=128) decoded = cfxr.decode_compress_to_multi_index(encoded, "landpoint")

File [~/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py:116](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a2232302e37372e32382e323139222c2275736572223a226d697269636f227d.vscode-resource.vscode-cdn.net/home/mirico/git/~/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py:116), in decode_compress_to_multi_index(encoded, idxnames)
    [110](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=109)     from xarray.indexes import PandasMultiIndex
    [112](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=111)     variables = {
    [113](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=112)         dim: encoded[dim].isel({dim: xr.Variable(data=index, dims=idxname)})
    [114](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=113)         for dim, index in zip(names, indices)
    [115](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=114)     }
--> [116](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=115)     decoded = decoded.assign_coords(variables).set_xindex(
    [117](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=116)         names, PandasMultiIndex
    [118](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=117)     )
    [119](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=118) except ImportError:
    [120](file:///home/mirico/devenv3/lib/python3.11/site-packages/cf_xarray/coding.py?line=119)     arrays = [encoded[dim].data[index] for dim, index in zip(names, indices)]

File [~/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py:4330](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a2232302e37372e32382e323139222c2275736572223a226d697269636f227d.vscode-resource.vscode-cdn.net/home/mirico/git/~/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py:4330), in Dataset.set_xindex(self, coord_names, index_cls, **options)
   [4327](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4326) indexed_coords = set(coord_names) & set(self._indexes)
   [4329](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4328) if indexed_coords:
-> [4330](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4329)     raise ValueError(
   [4331](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4330)         f"those coordinates already have an index: {indexed_coords}"
   [4332](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4331)     )
   [4334](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4333) coord_vars = {name: self._variables[name] for name in coord_names}
   [4336](file:///home/mirico/devenv3/lib/python3.11/site-packages/xarray/core/dataset.py?line=4335) index = index_cls.from_variables(coord_vars, options=options)

ValueError: those coordinates already have an index: {'lat'}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant