Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance with numpy_groupies #222

Open
dcherian opened this issue Feb 21, 2023 · 3 comments
Open

Improve performance with numpy_groupies #222

dcherian opened this issue Feb 21, 2023 · 3 comments

Comments

@dcherian
Copy link
Collaborator

dcherian commented Feb 21, 2023

IMO our main bottleneck now is how numpy_groupies converts nD problems to a 1D problem before using bincount, ufunc.at etc (ml31415/numpy-groupies#46). (e.g. grouping an nD array by a 1D array time.month and reducing along 1D time).

I tried to fix this but it had to be reverted because it doesn't generalize for axis != -1.

  1. We could just use it in numpy-groupies when axis == -1 and use the standard path for other cases. This would be good I think. (see Use faster group_idx creation when axis == -1 ml31415/numpy-groupies#77)
  2. flox still has the problem that for reductions like mean we compute 2 reductions for dask arrays: sum and count. This means we incur the cost twice. To avoid this numpy-groupies would have to support multiple reductions (which they don't want to); or we make the transformation to a 1D problem ourselves. This is annoying but doable.

PS: We could totally avoid all this but building out numbagg's groupby which IIRC is stuck on implementing a proper fill_value that is not the identity element for reductions.

cc @Illviljan @TomNicholas

@dcherian
Copy link
Collaborator Author

dcherian commented Mar 27, 2023

Note that (2) is worse because we always accumulate count with xarray because min_count=1 by default. Potentially this could be optimized (I don't remember if I did)

@ml31415
Copy link

ml31415 commented Mar 27, 2023

About ml31415/numpy-groupies#3 I'm not categorically against adding multiple aggregations in one go. It's mainly, that so far I considered the setup overhead of aggregate as small enough to not be worth making the API more complicated. I'd argue this is still true for the 1D case, as it doesn't do more than the most necessary type and size checks. I didn't do any benchmarks, but if the raveling/unraveling should turn out to be a bottleneck, sure, we should try to find a better solution.

As you mentioned bincount, there is still a 2x-4x speed up to be gained by using the numba version compared to the bincount-depending numpy-only version (1D case).

@dcherian
Copy link
Collaborator Author

dcherian commented Mar 27, 2023

if the raveling/unraveling should turn out to be a bottleneck, sure, we should try to find a better solution.

In my benchmarks this was ~25-30% of the time for nd array, 1D group_idx though ml31415/numpy-groupies#77 should reduce that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants