why is using flox slower than not using flox (on a laptop) // one example from flox docs #363

Thomas-Moore-Creative · 2024-05-02T02:56:19Z

Thanks to @dcherian and others who have been obsessively trying to make very common tasks like calculating climatological aggregations on climate data faster and easier.

I have some use cases on HPC where we are running climatological aggregations (and climatological aggregations composited over ENSO phases) on very large 4D dask arrays (11TB) and I have been digging into how best to employ flox.

BUT today our HPC centre is down for regular maintenance so I tried to run some dummy examples on my laptop ( Apple M2 silicon, 32GB RAM ) using a 4 worker LocalCluster and this example from the documentation - How about other climatologies?

The one change I made was to replace ones with random - as this seemed a more realistic test. I have no evidence but wonder if ones would be something "easier" for xarray?

The dummy array ended up being:

oisst object is 120.5342208 GB 
3.7666944 times bigger than total memory.

To my surprise not using flox was always much faster? I forced flox to try both map-reduce and cohorts.
RESULTS: (which were repeatable and run after clearing memory and restarting the cluster)

with flox map-reduce = CPU times: user 7.07 s, sys: 1.44 s, total: 8.51 s = Wall time: 2min 9s
with flox cohorts = CPU times: user 5.82 s, sys: 1.16 s, total: 6.98 s = Wall time: 1min 20s
without flox = CPU times: user 3.37 s, sys: 1.39 s, total: 4.77 s = Wall time: 29.5 s

code is here via nbviewer

My goal was to generate an easy to run notebook where I could demonstrate to my colleagues the power of flox. Instead, I'm a bit less confident I understand how this works.

Questions:

Is this expected?
Am I doing something silly or just misunderstanding something fundamental?
Or is this all down to something in the differences in system architecture between a modern laptop and HPC or Cloud?

Thanks!

The text was updated successfully, but these errors were encountered:

dcherian · 2024-05-02T03:56:41Z

Nice example.

I believe this is #222 .

I ran the reduction and took a mental note of the timings for the "blocks" tasks while specifying the "engine" kwarg, this approximately reflects the cost of running the first groupby-reduction on every block of data (usually the most computationally intensive piece)

|---------+-------------|
| engine  | approx time |
|---------+-------------|
| numpy   | 1s          |
| flox    | 500ms       |
| numbagg | 300ms       |
|---------+-------------|

So installing "numbagg" or specifying engine="flox" might bring it back to parity.

I'll note that flox's real innovation is making more things possible (e.g. this post) that would just straight up fail otherwise.

The default Xarray strategy does work well for a few chunking schemes (indeed this observation inspired "cohorts"), but it's hard to predict if you haven't deeply thought about groupby.

EDIT: I love that "cohorts" (the automatic choice) is 2x faster than "map-reduce".

Thomas-Moore-Creative · 2024-05-02T04:16:41Z

@dcherian - thanks for these comments ( and for all the helpful tools! )

I'll note that flox's real innovation is making more things possible (e.g. this post) that would just straight up fail otherwise.

I do really appreciate this important point - even if I possibly currently lack the understanding to write a simple example that shows this for climatological aggregations over one dimension. I did try to push the size of the array farther to reach a point where "not-flox" failed and "flox" completed. But in this simple case I couldn't seem to do that with the array size changes I was making? Given my real world problem is trying to apply climatological aggregations to 11TB arrays "making more things possible" is the gold star for a cluster of given size and why flox is so welcome.

re: numbagg - my very ignorant understanding was that this only helps with NaN excluding calculations? .... ahh, but this highlights that the default for xr.mean() is skipna = True. Even though we don't have any NaN's here I suppose we are actually running nanmean() not mean()?

I'll try to apply some of your comments here . . .

Thomas-Moore-Creative · 2024-05-02T04:29:06Z

..... specifying engine="flox" might bring it back to parity.

Something else I'm clearly not understanding - I thought that current xarray will automatically use flox if:

- it's installed
- import flox
- xr.options shows Option: use_flox, Value: True

in this case how does adding engine="flox" change things?

Thomas-Moore-Creative · 2024-05-02T04:51:45Z

OK - yes . . .

engine="flox" significantly speeds up .mean()
I didn't try numbagg but skipna = False speeds up all flavours of the calculation ( regardless of the lack of NaN's in the array = my ignorance )
engine="flox" does bring it closer to parity 20.9 s (no-flox & skipna=False) vs 36.4 s (flox, cohorts, & skipna=False) ... but as above comment I'm a bit unclear on the syntax in xr.groupby.mean thinking that flox would have been automatic?

dcherian · 2024-05-02T05:11:42Z

but as above comment I'm a bit unclear on the syntax in xr.groupby.mean thinking that flox would have been automatic?

Yes unclear syntax. See https://flox.readthedocs.io/en/latest/engines.html. Basically there's two levels to flox
(1) vectorized groupby algos for numpy arrays
(2) optimized graphs for dask arrays.

engine controls strategy for (1).
method controls strategy for (2).

By setting engine="flox" you're opting in to flox's internal vectorized algo. This is a super great idea when your groups are sorted. However I have now realized that my current heuristic for choosing this assumes numpy arrays, we can do better for the dask arrays (like the one you're working with).

This is something to fix. so thanks for taking the time to write this up :)

Installing numbagg should then get you faster than default, though there is a (small) cost to compiling.

Setting skipna=False if you don't have NaNs is always a good idea. It avoid some extra memory copies.

dcherian · 2024-05-02T05:41:05Z

I did try to push the size of the array farther to reach a point where "not-flox" failed and "flox" completed. But in this simple case I couldn't seem to do that with the array size changes I was making?

Nice. one of the "challenges" is that dask tends to improve with time, so this envelope keeps shifting (and sometimes regresses hehe).

dcherian · 2024-05-02T05:42:59Z

I'll note that my major goal is here is to get decent perf with 0 thinking :)

Hence my excitement that we are automatically choosing method="cohorts", so you needn't set that.

Clearly, I need to think more about how to set engine so that this pain here goes away.

dcherian · 2024-05-02T05:52:11Z

You might try a daily climatology or an hourly climatology to see how things shape up

Thomas-Moore-Creative added a commit to Thomas-Moore-Creative/Climatology-generator-demo that referenced this issue May 2, 2024

apply suggestions from @dcherian wrt xarray-contrib/flox#363

118845a

dcherian mentioned this issue May 2, 2024

Optimize min_count when expected_groups is not provided. #236

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why is using flox slower than not using flox (on a laptop) // one example from flox docs #363

why is using flox slower than not using flox (on a laptop) // one example from flox docs #363

Thomas-Moore-Creative commented May 2, 2024 •

edited

dcherian commented May 2, 2024 •

edited

Thomas-Moore-Creative commented May 2, 2024 •

edited

Thomas-Moore-Creative commented May 2, 2024

Thomas-Moore-Creative commented May 2, 2024

dcherian commented May 2, 2024 •

edited

dcherian commented May 2, 2024

dcherian commented May 2, 2024

dcherian commented May 2, 2024

why is using flox slower than not using flox (on a laptop) // one example from flox docs #363

why is using flox slower than not using flox (on a laptop) // one example from flox docs #363

Comments

Thomas-Moore-Creative commented May 2, 2024 • edited

dcherian commented May 2, 2024 • edited

Thomas-Moore-Creative commented May 2, 2024 • edited

Thomas-Moore-Creative commented May 2, 2024

Thomas-Moore-Creative commented May 2, 2024

dcherian commented May 2, 2024 • edited

dcherian commented May 2, 2024

dcherian commented May 2, 2024

dcherian commented May 2, 2024

Thomas-Moore-Creative commented May 2, 2024 •

edited

dcherian commented May 2, 2024 •

edited

Thomas-Moore-Creative commented May 2, 2024 •

edited

dcherian commented May 2, 2024 •

edited