Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep attributes for "bounds" variables #8924

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

st-bender
Copy link
Contributor

Issue #2921 is about mismatching time units between a time variable and its "bounds" companion.
However, #2965 does more than fixing #2921, it removes all double attributes from "bounds" variables which has the undesired side effect that there is currently no way to save them to netcdf with xarray. Since the mentioned link is a recommendation and not a hard requirement for CF compliance, these attributes should be left to the caller to prepare the dataset variables appropriately if required. Reduces the amount of surprise that attributes are not written to disk and fixes #8368.

Issue pydata#2921 is about mismatching time units between a time variable and
its "bounds" companion.
However, pydata#2965 does more than fixing pydata#2921, it removes all double
attributes from "bounds" variables which has the undesired side effect
that there is currently no way to save them to netcdf with xarray.
Since the mentioned link is a recommendation and not a hard requirement
for CF compliance, these attributes should be left to the caller to
prepare the dataset variables appropriately if required.
Reduces the amount of surprise that attributes are not written to disk
and fixes pydata#8368.
@@ -794,24 +794,4 @@ def cf_encoder(variables: T_Variables, attributes: T_Attrs):

new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}

# Remove attrs from bounds variables (issue #2921)
Copy link
Contributor

@dcherian dcherian Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a bit more complex.

We treat the variable, and the associated bounds variable, independently in Line 795. So the approach was to set the units on both if not present, do the encoding, then delete the units etc. on bounds variables.

I think we should still delete these attributes if the user didn't set them. That means we'll have to detect these variable names in _update_bounds_encoding and then modify this loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a bit more complex.

We treat the variable, and the associated bounds variable, independently in Line 795. So the approach was to set the units on both if not present, do the encoding, then delete the units etc. on bounds variables.

Why do you need to set them if they are not set by the user?

I think we should still delete these attributes if the user didn't set them. That means we'll have to detect these variable names in _update_bounds_encoding and then modify this loop.

They shouldn't be there if the user did not set them.
Maybe I am a bit confused here, could you elaborate?

Copy link
Contributor

@dcherian dcherian Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to set them if they are not set by the user?

To ensure that the two variables are encoded similarly as the standard suggests.

Our encode/decode logic treats all variables independently, so we need to do this to treat the time and time_bounds variables identically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to set them if they are not set by the user?

To ensure that the two variables are encoded similarly as the standard suggests.

Our encode/decode logic treats all variables independently, so we need to do this to treat the time and time_bounds variables identically.

I am only talking about writing the dataset to disk, so the encoding side of it before writing. I might see the point for time variables, but this could all be handled internally in .encoding (which is used for time variables anyway).

I still don't quite understand the logic for all the others. The standard suggets that if the attributes exist, they should be identical. If they don't exist, well then just don't bother and leave it to the reader to infer the encoding from the parent variable. I don't see the necessitiy to pollute .attrs for that.

st-bender and others added 3 commits May 21, 2024 11:11
After encoding these and removing the removal of attributes from time
bounds variables, these should be in the attributes. Removes the checks
that they aren't.
Makes the `time_bounds` calendar really different from `time` and checks
that they are correctly encoded to be non-CF compliant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_netcdf: Unexpected drop of "units" attribute of attached "bounds"
2 participants