-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve how we store metadata #2702
Comments
As an additional option for some of these fields, I wonder if we could store them as coordinate variables? I'm not a huge fan of the "metadata" DataTree, but I also really don't have a good alternative except for maybe coordinate variables. Just so we have the whole picture, besides units can you think of other attributes that these metadata "things" need to keep track of? In some sense this feels like a failing of the NetCDF model. Where we might want units or other information for attributes we can't do it. I feel like most NetCDF creators have leaned towards external documentation that describes the units of attributes. Not that I want to necessarily do that, but could that simplify this if we "forced" that all distances are in meters, all times are epoch nanoseconds, etc? |
Indeed, coordinate variables would work, nice idea!
Looking at the Metadata section in the documentation, there are also attributes for area definition and raw metadata. The area def can be stored as coordinate variable, too. And raw metadata is probably too special...
This would certainly help with units, but not with dimensions or encoding. |
I think it's a good idea to improve the metadata storage indeed. Coordinate variables are good for some things. For example, in rioxarray, projection information is store in a But then comes the question of what really belongs as a coordinate variable. I see coordinate variables as variables that are related in one way or another to dimension/coordinates, hence eg calibration coefficients would not really fit there. In the first example you provide, the gain and offset are aligned to the time dimension, is this intentional, or should they be dimensionless? |
Good point.
I wanted to indicate that coefficients can be time-dependent, but it isn't strictly needed. |
Feature Request
Is your feature request related to a problem? Please describe.
We repeatedly face encoding/decoding problems in the CF writer, so I have been thinking a bit about how we store metadata and possible alternatives.
Pros
Currently we store metadata in dataset attributes. This is very convenient, because we can just attach any Python type (string, array, dict etc) to a dataset.
Cons
Over time attributes have become more complex: From a simple sensor name to orbital parameters, time parameters, calibration coefficients etc. Now we're experiencing some problems with that approach
Describe the solution you'd like
I think many attributes could be stored as Data Arrays or Datasets. Then we could leverage the full potential of the xarray data model combined with CF conventions. We would still be using datasets attributes, but mainly strings (such as
units
,calendar
etc).When we are moving towards data trees (#2605), we could organize them in a "metadata" branch (either global or dataset-specific). For example:
Describe any changes to existing user workflow
Users would have to adapt to the new data structure.
Additional context
For units, we could use
pint
already.The text was updated successfully, but these errors were encountered: