Support gather for different sizes of types on data and indices #751

Yuhta · 2022-05-20T16:59:14Z

We are recently using xsimd to make our Velox query evaluation engine portable. One of the gap we found is xsimd does not support gather for different sizes of types on data and indices. For example we have gather of int64 data with int32 indices, and this can be implemented using __m256i as data register and __m128i as index register on AVX2. Is there a way to solve this? You can refer to our implementation for some idea. If you agree with our approach, we can even help integrate the implementation into xsimd.

In our project we implemented a HalfBatch type that can return __m128i on AVX2 and use it. The details can be found here: https://github.com/facebookincubator/velox/blob/main/velox/common/base/SimdUtil.h#L76-L132

Our gather and maskGather implementation: https://github.com/facebookincubator/velox/blob/main/velox/common/base/SimdUtil.h#L134-L268

I am happy to answer any questions you have. And thank you for creating this library, it really helps to allow us to rewrite our SIMD code in a portable and readable manner.

The text was updated successfully, but these errors were encountered:

serge-sans-paille · 2022-05-21T06:03:36Z

I'm not sure about the HalfBatch, but if I ere to implement it in xsimd, I would make it a type adaptor, something alike

xsimd::half_batch<B>::type instead of introducing new batch types. It would map batch<float, avx2> to batch<float, sse4.2>

But the fact that it doesn't have any specialization for sse it disturbs me.

amyspark · 2022-05-24T01:11:06Z

This is what I did to hand-optimize two cases we use at Krita:

xsimd/include/xsimd/arch/xsimd_avx2.hpp

Lines 350 to 369 in c7567bb

    
           // gather: handmade conversions 
        
           template <class A, class V, detail::enable_sized_integral_t<V, 4> = 0> 
        
           inline batch<float, A> gather(batch<float, A> const&, double const* src, 
        
                                         batch<V, A> const& index, 
        
                                         requires_arch<avx2>) noexcept 
        
           { 
        
               const batch<double, A> low(_mm256_i32gather_pd(src, _mm256_castsi256_si128(index.data), sizeof(double))); 
        
               const batch<double, A> high(_mm256_i32gather_pd(src, _mm256_extractf128_si256(index.data, 1), sizeof(double))); 
        
               return detail::merge_sse(_mm256_cvtpd_ps(low.data), _mm256_cvtpd_ps(high.data)); 
        
           } 
        
           template <class A, class V, detail::enable_sized_integral_t<V, 4> = 0> 
        
           inline batch<int32_t, A> gather(batch<int32_t, A> const&, double const* src, 
        
                                           batch<V, A> const& index, 
        
                                           requires_arch<avx2>) noexcept 
        
           { 
        
               const batch<double, A> low(_mm256_i32gather_pd(src, _mm256_castsi256_si128(index.data), sizeof(double))); 
        
               const batch<double, A> high(_mm256_i32gather_pd(src, _mm256_extractf128_si256(index.data, 1), sizeof(double))); 
        
               return detail::merge_sse(_mm256_cvtpd_epi32(low.data), _mm256_cvtpd_epi32(high.data)); 
        
           }

Instead of using separate batch types, I would suggest to SFINAE on the size of the index batch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support gather for different sizes of types on data and indices #751

Support gather for different sizes of types on data and indices #751

Yuhta commented May 20, 2022 •

edited

serge-sans-paille commented May 21, 2022

amyspark commented May 24, 2022

Support gather for different sizes of types on data and indices #751

Support gather for different sizes of types on data and indices #751

Comments

Yuhta commented May 20, 2022 • edited

serge-sans-paille commented May 21, 2022

amyspark commented May 24, 2022

Yuhta commented May 20, 2022 •

edited