Improve performance with `numpy_groupies` #222

dcherian · 2023-02-21T16:40:48Z

IMO our main bottleneck now is how numpy_groupies converts nD problems to a 1D problem before using bincount, ufunc.at etc (ml31415/numpy-groupies#46). (e.g. grouping an nD array by a 1D array time.month and reducing along 1D time).

~~I tried to fix this but it had to be reverted because it doesn't generalize for axis != -1.~~

~~We could just use it in numpy-groupies when axis == -1 and use the standard path for other cases. This would be good I think.~~ (see Use faster group_idx creation when axis == -1 ml31415/numpy-groupies#77)
flox still has the problem that for reductions like mean we compute 2 reductions for dask arrays: sum and count. This means we incur the cost twice. To avoid this numpy-groupies would have to support multiple reductions (which they don't want to); or we make the transformation to a 1D problem ourselves. This is annoying but doable.

PS: We could totally avoid all this but building out numbagg's groupby which IIRC is stuck on implementing a proper fill_value that is not the identity element for reductions.

cc @Illviljan @TomNicholas

The text was updated successfully, but these errors were encountered:

xref xarray-contrib/flox#222 xref ml31415#46 xref ml31415#51 xref ml31415#60

dcherian · 2023-03-27T16:47:22Z

Note that (2) is worse because we always accumulate count with xarray because min_count=1 by default. Potentially this could be optimized (I don't remember if I did)

ml31415 · 2023-03-27T17:11:07Z

About ml31415/numpy-groupies#3 I'm not categorically against adding multiple aggregations in one go. It's mainly, that so far I considered the setup overhead of aggregate as small enough to not be worth making the API more complicated. I'd argue this is still true for the 1D case, as it doesn't do more than the most necessary type and size checks. I didn't do any benchmarks, but if the raveling/unraveling should turn out to be a bottleneck, sure, we should try to find a better solution.

As you mentioned bincount, there is still a 2x-4x speed up to be gained by using the numba version compared to the bincount-depending numpy-only version (1D case).

dcherian · 2023-03-27T17:29:38Z

if the raveling/unraveling should turn out to be a bottleneck, sure, we should try to find a better solution.

In my benchmarks this was ~25-30% of the time for nd array, 1D group_idx though ml31415/numpy-groupies#77 should reduce that

dcherian added a commit to dcherian/numpy-groupies that referenced this issue Mar 27, 2023

Use faster group_idx creation when axis == -1

717b96c

xref xarray-contrib/flox#222 xref ml31415#46 xref ml31415#51 xref ml31415#60

dcherian mentioned this issue Mar 27, 2023

Use faster group_idx creation when axis == -1 ml31415/numpy-groupies#77

Merged

dcherian mentioned this issue Apr 7, 2023

flox performance regression for cftime resampling pydata/xarray#7730

Closed

4 tasks

dcherian added the performance label May 11, 2023

dcherian mentioned this issue May 2, 2024

why is using flox slower than not using flox (on a laptop) // one example from flox docs #363

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance with `numpy_groupies` #222

Improve performance with `numpy_groupies` #222

dcherian commented Feb 21, 2023 •

edited

dcherian commented Mar 27, 2023 •

edited

ml31415 commented Mar 27, 2023 •

edited

dcherian commented Mar 27, 2023 •

edited

Improve performance with numpy_groupies #222

Improve performance with numpy_groupies #222

Comments

dcherian commented Feb 21, 2023 • edited

dcherian commented Mar 27, 2023 • edited

ml31415 commented Mar 27, 2023 • edited

dcherian commented Mar 27, 2023 • edited

Improve performance with `numpy_groupies` #222

Improve performance with `numpy_groupies` #222

dcherian commented Feb 21, 2023 •

edited

dcherian commented Mar 27, 2023 •

edited

ml31415 commented Mar 27, 2023 •

edited

dcherian commented Mar 27, 2023 •

edited