Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API? #6

Open
12 of 16 tasks
TomNicholas opened this issue Apr 8, 2020 · 17 comments
Open
12 of 16 tasks

API? #6

TomNicholas opened this issue Apr 8, 2020 · 17 comments

Comments

@TomNicholas
Copy link
Collaborator

TomNicholas commented Apr 8, 2020

@jthielen proposed a rough accessor API in pint/#849, to which I've added a couple of things:

DataArray:

  • da.pint.to(...): return dataarray with converted units (clean-up and implement the conversion methods #11)
  • da.pint.to_base_units(): return dataarray with base units
  • da.pint.units: units of quantity (as a Unit)
  • da.pint.magnitude: magnitude of quantity
  • da.pint.quantify(unit_registry=None, unit=None): create DataArray wrapping a Quantity based on string unit attribute of DataArray or specified unit
  • da.pint.dequantify(): replace data with the Quantity's magnitude, and add back string unit attribute from Quantity's unit
  • da.pint.sel(): wrap da.sel to handle indexing with Quantities (by casting to magnitude in the coordinate's units similar to how MetPy does it, since true unit-aware indexing is not available yet in xarray)
  • da.pint.loc: wrap da.loc likewise
  • da.pint.to_system(system): convert all variables to be expressed in the (base?) units of a different system. Might require upstream additions to pint.

Dataset:

  • ds.pint.to(...): return Dataset with converted units (clean-up and implement the conversion methods #11)
  • ds.pint.to_base_units(): return dataset with base units
  • ds.pint.quantify(unit_registry=None): convert all data variables to quantities
  • ds.pint.dequantify(): convert all data variables from quantities to magnitudes with units as an attribute
  • ds.pint.sel(): wrap ds.sel to handle indexing with Quantities
  • ds.pint.loc: wrap ds.loc likewise
  • ds.pint.to_system(system): convert all variables to be expressed in the (base?) units of a different system.

(this may be modified as things change on xarray's and pint's end, especially involving Dask arrays (xref #883))

Anything else?

At some point when the integration is more solidified (but before official release) we should change the accessor from pint to units, to get a interface more like what's described here.
This would be:
a) More intuitive
b) Units-library agnostic
c) A good fit for potentially using an entrypoint to choose which units library you want to use. There's already an entrypoint for plotting backends in xarray, and plans to add one for storage backends too.

@jthielen
Copy link
Collaborator

jthielen commented Apr 8, 2020

One additional thing I've realized is needed since writing hgrecco/pint#849 (comment) is a helper for coordinate unit conversion. Not sure what the best name for this would be though. In MetPy, the tentative name is .convert_coordinate_units, but that really only makes sense since the MetPy accessor's version of .to is .convert_units. Maybe .coordinate_to?

See Unidata/MetPy#1325 / https://github.com/Unidata/MetPy/compare/v0.12.0...jthielen:0-12-patch-xarray-0-15-1?expand=1

@keewis
Copy link
Collaborator

keewis commented Apr 8, 2020

we could also provide work-arounds for operations like rolling or ffill that, due the functions and libraries they use (i.e. bottleneck, numbagg, scipy, numpy.lib, etc., but also numpy.vectorize), won't be able to support duck arrays in the near future (though we should probably wait until there is a final decision on that, right now these methods were simply postponed).

@TomNicholas
Copy link
Collaborator Author

a helper for coordinate unit conversion.

So a da.pint.coords_to, that gets called within da.pint.to?

Also what about UnitRegistry().wraps? Do we need a xarray equivalent for that? Would that help with providing workarounds for rolling etc?

@jthielen
Copy link
Collaborator

jthielen commented Apr 9, 2020

a helper for coordinate unit conversion.

So a da.pint.coords_to, that gets called within da.pint.to?

Not quite, it would be separate. I'm thinking of something to change the units on a coordinate of da without changing anything else on da, which needs to follow different logic given that Indexes are immutable, and, for now, not supporting units directly...so something like

def coord_to(self, coord, units):
    new_coord_var = self.da[coord].copy(
        data=Quantity(self.da[coord].values, self.da[coord].attrs.get('units')).m_as(units)
    )
    new_coord_var.attrs['units'] = str(units)
    return self.da.assign_coords(coords={coord: new_coord_var})

@keewis
Copy link
Collaborator

keewis commented Apr 9, 2020

how about also accepting a dict / kwargs dict (like assign / assign_coords) to DataArray.pint.to and then using different branches for data and coordinates? That way, converting the data and multiple coordinates to (different) units would be possible with one function call.

Obviously, we need something really similar for Dataset.pint.to, so we could implement it by using _to_temp_dataset and to_dataarray.

@jthielen
Copy link
Collaborator

jthielen commented Apr 9, 2020

how about also accepting a dict / kwargs dict (like assign / assign_coords) to DataArray.pint.to and then using different branches for data and coordinates? That way, converting the data and multiple coordinates to (different) units would be possible with one function call.

So long as we don't lose the easy ability to just convert data units (e.g., da.pint.to('degC')), that sounds great.

@TomNicholas
Copy link
Collaborator Author

TomNicholas commented Apr 9, 2020

So as a to-do list, that would mean one positional arg for the data and multiple keyword args for the coords in da.pint.to(), then all keyword args for ds.pint.to():

  • da.pint.to(unit) - convert data
  • da.pint.to(new_unit1, coord1=new_unit2, coord2=new_unit3, ...) - convert data and multiple coords simultaneously
  • ds.pint.to(var1=new_unit1, var2=new_unit2, coord1=new_unit3, ...) - convert multiple data vars and coords simultaneously

@keewis
Copy link
Collaborator

keewis commented Apr 9, 2020

Agreed for points 1 (DataArray only?) and 3 (Dataset only). For 2 there might be a few more options (nothing wrong with this one), but I think we should discuss them in a separate issue / when we implement it to keep this issue as a high level API overview.

@keewis
Copy link
Collaborator

keewis commented Apr 15, 2020

something else that is not a part of the API but in scope for this package (I think?): maybe we could monkeypatch the repr of DataArray, Variable and Dataset (or register some sort of repr function for pint arrays (via entrypoints? on import?)) to make it more readable (i.e. make sure the unit is always visible). See also pydata/xarray#2773

@TomNicholas
Copy link
Collaborator Author

I definitely think that's in scope for this package, at least until some more general solution for reprs of all wrapped arrays is implemented upstream.

Monkey-patching actually seems like a reasonable temporary solution here I think? The worst that can happen is that some other method which uses the repr fails to display the units right

@jthielen
Copy link
Collaborator

I definitely agree that some kind of repr fix is in order. The current default is...less than ideal...

@keewis
Copy link
Collaborator

keewis commented Jul 8, 2020

from #11:

  • pint_xarray.testing: for testing purposes
  • conversion / extraction functions (from pint_xarray.conversions)

@keewis
Copy link
Collaborator

keewis commented Jul 17, 2020

if we can get the reprs to work (pydata/xarray#2773, hgrecco/pint#1133), improve the documentation and maybe also expose the pint_xarray.testing module, we should be close enough to release a initial development version (0.1?)

bors bot added a commit to hgrecco/pint that referenced this issue Aug 17, 2020
1133: Change HTML repr away from LaTeX to a Sparse/Dask-like repr r=hgrecco a=jthielen

While I've never been a big fan of Pint's LaTeX repr in JupyterLab/Jupyter Notebook due to working with large arrays that would crash the renderer, I've recently been working a lot with various wrapping combinations of xarray, Pint, Dask, and Sparse, in which Pint's LaTeX repr performs particularly poorly. Even if Pint's repr were elided (#765), it would still poorly nest with the other HTML reprs.

This motivated me to implement an alternative HTML repr for Pint using a simple HTML table similar to what both Dask and Sparse use. As shown in [this demonstration notebook](https://nbviewer.jupyter.org/urls/dl.dropbox.com/s/jlgqm6s92kzvagi/pint_html_repr_demo.ipynb), this allows much better nesting of HTML reprs. A quick example image is below:

![](https://www.meteor.iastate.edu/~jthielen/Screenshot%20from%202020-07-14%2017-07-51.png)

This doesn't quite take care of [xarray's unit repr discussions](pydata/xarray#2773) (since this only shows up when expanding the data in a dataset), but I still think it is a worthwhile improvement for the expanded view. @keewis do you have any thoughts on this particular interaction?

Marking as a WIP/draft for now to see if this is a welcome format for the HTML repr first before I dig too much into optimizing the implementation and writing tests for it.

- [x] Closes #654 (even though it was already closed in favor of #765, I think the are related but separate issues)
- [x] Executed ``black -t py36 . && isort -rc . && flake8`` with no errors
- [x] The change is fully covered by automated unit tests
- [x] Documented in docs/ as appropriate
- [x] Added an entry to the CHANGES file

Tagging @nbren12, @hgrecco, @jondoesntgit, @natezb, @ipcoder, @keewis, and @TomNicholas for input based on past issues (#654, #765, and xarray-contrib/pint-xarray#6).


Co-authored-by: Jon Thielen <github@jont.cc>
@keewis
Copy link
Collaborator

keewis commented Aug 22, 2020

the HTML repr works (pint got released today), and there are #20 (docs), #22 (inline repr) and #24 (testing), so once those are merged and the placeholder functions we have right now are temporarily removed, we should be able to release 0.1

@jthielen
Copy link
Collaborator

the HTML repr works (pint got released today), and there are #20 (docs), #22 (inline repr) and #24 (testing), so once those are merged and the placeholder functions we have right now are temporarily removed, we should be able to release 0.1

Sounds great! Right now though it looks like the only placeholders are sel and loc. If you're good with waiting a few days, I should be able to get around to adding those in time for 0.1...I should be able to implement them the same way I did in MetPy.

@keewis
Copy link
Collaborator

keewis commented Aug 23, 2020

I was thinking of plus_minus, to_base_units and to_system, which don't have docstrings, work only on the data or raise NotImplementedError.

Edit: let's continue this in #25

@keewis
Copy link
Collaborator

keewis commented Feb 20, 2021

from #61: try to dequantify automatically before plotting (see xarray.plot.plot.label_from_attrs)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants