Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CF coordinates get forgotten after operation modifying coordinates #396

Open
angus-g opened this issue Jan 25, 2023 · 4 comments
Open

CF coordinates get forgotten after operation modifying coordinates #396

angus-g opened this issue Jan 25, 2023 · 4 comments

Comments

@angus-g
Copy link

angus-g commented Jan 25, 2023

I have a DataArray that's similar, but not quite the same as the popds example dataset (the axes aren't labelled, and it has a time dimension):

<xarray.DataArray 'Tair_m' (time: 1, nj: 1080, ni: 1440)>
array([[[nan, nan, ..., nan, nan],
        [nan, nan, ..., nan, nan],
        ...,
        [nan, nan, ..., nan, nan],
        [nan, nan, ..., nan, nan]]], dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 2255-01-01
    TLON     (nj, ni) float32 nan nan nan nan nan nan ... nan nan nan nan nan
    TLAT     (nj, ni) float32 nan nan nan nan nan nan ... nan nan nan nan nan
    ULON     (nj, ni) float32 nan nan nan nan nan nan ... nan nan nan nan nan
    ULAT     (nj, ni) float32 nan nan nan nan nan nan ... nan nan nan nan nan
Dimensions without coordinates: nj, ni
Attributes:
    units:          C
    long_name:      air temperature
    cell_measures:  area: tarea
    cell_methods:   time: mean
    time_rep:       averaged

And of course, the coordinates get correctly decoded:

>>> d.cf.coordinates
{'longitude': ['TLON'], 'latitude': ['TLAT']}

But if I take the time mean of this dataset, the correct coordinates get forgotten:

>>> d.mean("time").cf.coordinates
{'longitude': ['TLON', 'ULON'], 'latitude': ['TLAT', 'ULAT']}
@aulemahal
Copy link
Contributor

Hi @angus-g ! The correct CF coordinates are defined through attributes of the DataArray. I believe this would work:

d.mean('time', keep_attrs=True).cf.coordinates

Xarray drops the attributes after most operations, as the old attributes might not be applicable anymore. You can also override this behaviour globally with :

import xarray as xr
xr.set_options(keep_attrs=True)

@dcherian
Copy link
Contributor

dcherian commented Jan 25, 2023

Looking at that notebook, I see you have keep_attrs=True. So the "coordinates" attribute must be in .encoding and was then dropped when applying the mean.

I'm beginning to think the choice to look in encoding by default was wrong. It gets lost in a lot more places than attrs and is invisible to the casual user, and the long-term plan is to get rid of encoding

Maybe cf-xarray should provide a function that copies "interpretable" attributes from encoding to attrs(coordinates, bounds, cell_measures, grid_mapping, ancillary_variables)

@kthyng
Copy link
Contributor

kthyng commented Jan 25, 2023

@dcherian I have struggled with the invisibility of encoding many times! I didn't know it existed until after I started using cf-xarray which was a decade after I started running ocean models. I don't have a good code-based opinion, but I have struggled with it quite a bit. I'd like cf-xarray to be really easy to use, but also obvious about what is being interpreted and why to help figure out what is going on when working on some code.

@aidanheerdegen
Copy link
Contributor

Maybe cf-xarray should provide a function that copies "interpretable" attributes from encoding to attrs(coordinates, bounds, cell_measures, grid_mapping, ancillary_variables)

Might another option be to adopt something like pint.quantify and provide a cf.decode method that grabs all the important data and stores it somewhere cf_xarray specific that can be guaranteed to be propagated across operations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants