Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Sub-optimal performance on means of strided axes #2771

Open
spectre-ns opened this issue Jan 28, 2024 · 0 comments
Open

[Performance] Sub-optimal performance on means of strided axes #2771

spectre-ns opened this issue Jan 28, 2024 · 0 comments

Comments

@spectre-ns
Copy link
Contributor

The performance of means which are not on the contiguous axis in xtensor appear to be slower than optimal. I have provided benchmarks below using a more optimized approach. It uses memory coalescing to improve performance and cache hits by performing the mean in "groups" along the reduction axis rather than striding through memory. Would there be a way to implement this in xtensor to get the factor of 2 speed up?

See reference implementation here: https://github.com/spectre-ns/xtensor-benchmark/blob/bb2404641cfd632c459d4e91c3881ebd601b2a62/include/reduction.hpp#L14

xtensor_mean_on_second_axis<float>/8          561 ns          516 ns      1000000
xtensor_mean_on_second_axis<float>/64      128057 ns       129395 ns         6400
xtensor_mean_on_second_axis<double>/8         573 ns          551 ns      1445161
xtensor_mean_on_second_axis<double>/64     132998 ns       124512 ns         6400
native_mean_on_second_axis<float>/8           398 ns          392 ns      1792000
native_mean_on_second_axis<float>/64        41268 ns        41433 ns        16593
native_mean_on_second_axis<double>/8          334 ns          300 ns      2240000
native_mean_on_second_axis<double>/64       75037 ns        72545 ns        11200
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant