Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial collapse of multi-dim string coords: take 2 #5955

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

rcomer
Copy link
Member

@rcomer rcomer commented May 15, 2024

🚀 Pull Request

Description

This is the same bugfix as #4294. In there, I also tried to add type hints following my interpretation of the dev guide. This turned out to be complicated and I suspect is the main reason the PR has not made progress.

I think it would be good to fix the bug regardless of typing questions, so here is a version without the type hints which I'm hoping will be an easier review.

Fixes #3653.

If you have a 2D string coordinate on a cube, and collapse one of the dimensions spanned by that coordinate, the coordinate's points (and bounds, if any) are now serialised along the collapsed dimension. This seems to have been missed when partial collapse was implemented for the numeric coordinates.


Consult Iris pull request check list


Add any of the below labels to trigger actions on this PR:

  • benchmark_this Request that this pull request be benchmarked to check if it introduces performance shifts

Copy link

codecov bot commented May 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.78%. Comparing base (95b7ffe) to head (0079245).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5955   +/-   ##
=======================================
  Coverage   89.78%   89.78%           
=======================================
  Files          93       93           
  Lines       23007    23011    +4     
  Branches     5017     5021    +4     
=======================================
+ Hits        20657    20661    +4     
  Misses       1620     1620           
  Partials      730      730           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bounds = []
for index in np.ndindex(shape):
index_slice = (slice(None),) + tuple(index)
bounds.append(serialize(self.bounds[index_slice]))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I've understood the old slicing method, it and the new method are consistent for the basic case of 2D bounds (n_points, n_bounds). For higher dimensions, the old method doesn't look right to me: if we had a 2D (n, m) coordinate with bounds (n, m, n_bounds), it looks like we would get m * n_bounds elements in the list. But I think we would have wanted just n_bounds elements.

I didn't even know bounded string coords were a thing!

# Create the new collapsed coordinate.
coord = self.copy(points=np.array(points, dtype=dtype), bounds=bounds)
coord = self.copy(points=np.array(points), bounds=bounds)
Copy link
Member Author

@rcomer rcomer May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why it was necessary to set the dtype, but the existing tests pass fine without this now. The old code is 11 years old, so I guess numpy has moved on and is likely doing more for us automatically.

@rcomer
Copy link
Member Author

rcomer commented May 15, 2024

For reference, here is what we currently use for string coordinates in aggregated_by, which handles 2D coords but not bounded string coords.

if coord.points.dtype.kind in "SU":
if coord.bounds is None:
new_points_list = []
new_bounds = None
# np.apply_along_axis does not work with str.join, so we
# need to loop through the array directly. First move axis
# of interest to trailing dim and flatten the others.
work_arr = np.moveaxis(coord.points, dim, -1)
shape = work_arr.shape
work_shape = (-1, shape[-1])
new_shape: tuple[int, ...] = (len(self),)
if coord.ndim > 1:
new_shape += shape[:-1]
work_arr = work_arr.reshape(work_shape)
for indices in self._groupby_indices:
for arr in work_arr:
new_points_list.append("|".join(arr.take(indices)))
# Reinstate flattened dimensions. Aggregated dim now leads.
new_points = np.array(new_points_list).reshape(new_shape)
# Move aggregated dimension back to position it started in.
new_points = np.moveaxis(new_points, 0, dim)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

cube.collapsed fails with multi-dimensional string coordinates
1 participant