Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vectorize search #4654

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

vectorize search #4654

wants to merge 13 commits into from

Conversation

AlexGuteniev
Copy link
Contributor

@AlexGuteniev AlexGuteniev commented May 5, 2024

Resolves #2453

Before:

---------------------------------------------------------------------------------
Benchmark                                       Time             CPU   Iterations
---------------------------------------------------------------------------------
c_strstr                                      184 ns          119 ns      8960000
ranges_search<std::uint8_t>                  6276 ns         3899 ns       344615
ranges_search<std::uint16_t>                 6358 ns         3625 ns       331852
ranges_search<std::uint32_t>                 5652 ns         3692 ns       431622
ranges_search<std::uint64_t>                 7510 ns         4269 ns       373333
search_default_searcher<std::uint8_t>        1720 ns          921 ns      1120000
search_default_searcher<std::uint16_t>       2497 ns         1438 ns      1000000
search_default_searcher<std::uint32_t>       2224 ns         1001 ns      1280000
search_default_searcher<std::uint64_t>       2762 ns         1297 ns      1000000

After:

---------------------------------------------------------------------------------
Benchmark                                       Time             CPU   Iterations
---------------------------------------------------------------------------------
c_strstr                                      183 ns          102 ns     10000000
ranges_search<std::uint8_t>                   678 ns          355 ns      4266667
ranges_search<std::uint16_t>                 1285 ns          698 ns      1723077
ranges_search<std::uint32_t>                 2317 ns         1111 ns      1280000
ranges_search<std::uint64_t>                 4736 ns         2537 ns       597333
search_default_searcher<std::uint8_t>         669 ns          353 ns      4072727
search_default_searcher<std::uint16_t>       1293 ns          582 ns      1906383
search_default_searcher<std::uint32_t>       2261 ns         1400 ns       814545
search_default_searcher<std::uint64_t>       4756 ns         2511 ns       497778

strstr is given for a reference in the benchmark, it is not affected by the optimization.

It may be impossible to reach strstr performance, as it uses pcmpistri (and reading beyond the last element, as pcmpistri is not very useful otherwise). We can try pcmpestri for 8-bit and 16-bit cases, but still it may be not as efficient, as strstr. I'd prefer to try this additional optimization in a next PR though.

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner May 5, 2024 10:59
@AlexGuteniev
Copy link
Contributor Author

The difference in "before" results between ranges::search and search(..., default_searcher) is due to different implementation. search(..., default_searcher) doesn't try to use memcmp on every iteration. It is direct comparison. Apparently, this is faster.

It is curious that search(..., default_searcher) is faster for 64-bit type before vectorization.

I would want someone else to confirm the results, and maintainers decision what to do with this.

@AlexGuteniev
Copy link
Contributor Author

AlexGuteniev commented May 6, 2024

A possible way to handle it is to remove 32 and 64 bit optimization/vectorization attempts at all, so that ranges::search case (along with the usual std::search) would be the same as search(..., default_searcher)

@AlexGuteniev
Copy link
Contributor Author

And if we are to keep vectorization only for 8-bit and 16-bit elements, we may drop the current implementation and not review/commit it in the first place, if SSE4.2 pcmpestri smells like it would be faster.

@StephanTLavavej StephanTLavavej changed the title vectorize search vectorize search May 7, 2024
@StephanTLavavej StephanTLavavej added the performance Must go faster label May 7, 2024
@StephanTLavavej StephanTLavavej self-assigned this May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
Status: Initial Review
Development

Successfully merging this pull request may close these issues.

<algorithm>: vectorize search
2 participants