-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement the PintMetaIndex
#163
base: main
Are you sure you want to change the base?
Conversation
pint_xarray/index.py
Outdated
|
||
|
||
class PintMetaIndex(Index): | ||
# TODO: inherit from MetaIndex once that exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm actually I'm not sure how a MetaIndex
class would look like. So far we used the generic term "meta-index" to refer to indexes that would wrap one or several indexes, but I don't know if there will be a need to provide a generic class for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it doesn't really look like we actually need a base class for that, but I noticed that a few methods don't make sense for meta-indexes, from_variables
for example. It's probably fine to use the default for those, though.
Here are a few comments. Happy to answer questions if any. There are some Index methods of like For some other methods like The general approach used in the Xarray indexes refactor heavily relies on the type of the indexes (at least when we need to compare them together). That's not super flexible with the I wonder whether whether Regarding Index methods like You should also be careful when converting the units of indexed coordinates as it may get out of sync with their index. As there's no concept of "duck" index, the easiest would probably be to drop the index (and maybe reconstruct it from scratch) when the coordinates are updated. |
@keewis I have been looking into this once again and now I think I better understand what you'd like to achieve with the Wrap index coordinate variables as unit-aware variablesI'm not familiar with pint, but if a class PintMetaIndex:
def create_variables(self, variables=None):
index_vars = self.index.create_variables(variables)
index_vars_units = {}
for name, var in index_vars.items():
data = array_attach_unit(var.data, self.units[name])
var_units = xr.Variable(var.dims, data, attrs=var.attrs, encoding=var.encoding)
index_vars_units[name] = var_units
return index_var_units We cannot use IndexVariable since (if I remember well) it coerces the data as a Set new Pint index(es)Since @register_dataset_accessor("pint")
class PintDatasetAccessor:
def quantify(self, units=_default, unit_registry=None, **unit_kwargs)):
...
ds_xindexes = self.ds.xindexes
new_indexes, new_index_vars = ds_xindexes.copy_indexes()
for idx, idx_vars in ds_xindexes.group_by_index():
idx_units = {k: v for k, v in units.items() if k in idx_vars}
new_idx = PintMetaIndex(idx, idx_units)
new_indexes.update({k: new_idx for k in idx_vars})
new_index_vars.update(new_idx.create_variables(idx_vars))
new_coords = xr.Coordinates(new_index_vars, new_indexes)
# needs https://github.com/pydata/xarray/pull/8094 to work properly
ds_updated_temp = self.ds.assign_coords(new_coords)
... It is still useful to implement class PintMetaIndex:
@classmethod
def from_variables(cls, variables, options):
index = xr.indexes.PandasIndex.from_variables(variables)
units_dict = {index.index.name: options.get("units")}
return = cls(index, units_dict)
ds = xr.Dataset(coords={"x": [1, 2]})
ds_units = ds.drop_indexes("x").set_xindex("x", PintMetaIndex, units="m") Data selectionBeware that |
nit: I would rename |
Further comments: Implementing For |
@benbovy, with a few tweaks to your suggestions this: In [1]: import xarray as xr
...: import pint_xarray
...:
...: ureg = pint_xarray.unit_registry
...: ds = xr.tutorial.open_dataset("air_temperature")
...: q = ds.pint.quantify({"lat": "degrees", "lon": "degrees"})
...: q.sel(lat=ureg.Quantity(75, "deg").to("rad"))
.../xarray/core/indexes.py:473: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
index = pd.Index(np.asarray(array), **kwargs)
Out[1]:
<xarray.Dataset>
Dimensions: (time: 2920, lon: 53)
Coordinates:
lat float32 [deg] 75.0
* lon (lon) float32 [deg] 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0
* time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
air (time, lon) float32 [K] 241.2 242.5 243.5 ... 241.48999 241.79
Indexes:
lon PintMetaIndex
time PintMetaIndex
Attributes:
Conventions: COARDS
title: 4x daily NMC reanalysis (1948)
description: Data is from NMC initialized reanalysis\n(4x/day). These a...
platform: Model
references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly... does work 🎉 The only hickup is that somehow we seem to call |
(the failing tests are expected, I will have to update some of the workaround code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keewis Excellent!
Hmm the warning is weird. safe_cast_to_index
is called when a new PandasIndex
object is created but there no reason to create one in your example.
Small suggestion: you could implement PintMetaIndex._repr_inline_
so that it displays the type of the wrapped index (e.g., PintMetaIndex(PandasIndex)
)
Regarding all the errors in CI |
would that just delegate to the underlying index, or also wrap it (probably the former, but I wanted to make sure)? In any case, I wonder if we should just fix |
In the case of
Yes definitely, I think I fixed it in one of my open PRs in the Xarray repo. |
No, because |
@benbovy, something I just noticed: where do we need Other than that, I'm not sure what to do about I'll have a look at what needs to be changed to get the tests to work, although this would probably require changes to |
we also need to figure out how to handle |
I think that in general it would be nice if we can keep a simple and predictable behavior for I wouldn't make a special case for PintIndex or any other "meta-index". Although it is likely that in most cases pint indexes won't be set directly via
Hmm stack / unstack only act on the dimensions so in theory it shouldn't invalidate units? Since the In the case of a wrapped PandasMultiIndex, the created dimension coordinate (tuple values) may not have a single unit. Perhaps best is to keep it unitless. Eventually that coordinate will be deprecated anyway. That said, I think it's fine to leave stack / unstack unimplemented for now and implement them in follow-up PRs.
Hmm that sounds a bit too restrictive to me. How about issuing a user warning instead? In the long-term we can probably get rid of |
is there a way to access this default index type? Otherwise it's probably fine to keep hard-coding
Makes sense, though in that case to be extra friendly I wonder if we can customize the error message? Instead of recommending
This is what I'll see how far I can get without
That might be true, I might have been too tired yesterday when writing this. I'll still postpone the implementation to a later PR, though. |
Not yet (I was thinking too much ahead). For now it is PandasIndex but maybe later we could add some option in Xarray to get / set the default index type (e.g., for cases where PandasIndex is too expensive and/or not really needed).
You might want to see where it is called in Xarray internals. For example, if you leave
This would require some API entrypoint in Xarray, but I'm not sure it is really worth it since here you might recommend setting pint indexes "automatically" via |
now that I have more time to work on this, I think the only things left to fix right now is:
The remaining methods I think we can leave to separate PRs: |
For this one it would be nice if |
we're already using some part of the private API of Edit: oh, wait, maybe that's what you've been saying? I'll check. |
You'd need to pass |
Edit: It appears |
a couple more notes:
|
As mentioned in #162, it is possible to get the indexing functions to work, although there still is no public API.
I also still don't quite understand how other methods work since the refactor, so this only implements
sel
.Usage, for anyone who wants to play around with it
This will fail at the moment because
xarray
treatsdask
arrays differently from duck-dask
arrays, but passing single values works!PintMetaIndex
#162, closes Wrong units when usingda.integrate()
#205, closes Support for set_xindex? #218pre-commit run --all-files
whats-new.rst
api.rst