Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-vectorize count_if, count #4653

Closed
AlexGuteniev opened this issue May 4, 2024 · 0 comments
Closed

Auto-vectorize count_if, count #4653

AlexGuteniev opened this issue May 4, 2024 · 0 comments
Labels
performance Must go faster

Comments

@AlexGuteniev
Copy link
Contributor

AlexGuteniev commented May 4, 2024

This is the rephrasing of #4456, with all progress made so far incorporated.
 
count and count_if can be auto-vectorized as follows:

  • For sizeof(difference_type) == sizeof(T) they are already auto-vectorized
  • For sizeof(difference_type) < sizeof(T) can use the approach similar to Help the compiler vectorize std::iota #4627
  • For sizeof(difference_type) > sizeof(T) can also use the approach similar to Help the compiler vectorize std::iota #4627, but it will not cover some large array sizes. To cover large array sizes, can also split the range into smaller ranges, so that for these smaller ranges T is enough to represent the count.

For count_if this would be the only feasible way to vectorize, as predicates cannot be used in separately compiled implementation, and we don't want complex manual vectorization with intrinsics in headers for throughput reasons.

For count this can be still an alternative to manual vectorization. The performance of auto-vectorization when compiling with /arch:AVX2 seems to be not much worse than existing manual vectorization for large ranges, albeit significantly worse for small ranges with large tails (auto-vectorization doesn't do the mask thing). So we can:

  • Add auto-vectorization as an alternative to manual vectorization, when the latter is not available
    (ARM64, or opt-out from _USE_STD_VECTOR_ALGORITHMS)
  • Use auto-vectorization as the only one (lose some perf for tails, but have unified vectorization implementation)
@StephanTLavavej StephanTLavavej added the performance Must go faster label May 8, 2024
@AlexGuteniev AlexGuteniev closed this as not planned Won't fix, can't repro, duplicate, stale May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

2 participants