Notes on SIMD programming

Current state on intrinsics code in ViSP:

only x86 SSE (no AVX, AVX2, ARM NEON, ...)
SSE headers must be included in .cpp file to detect if the compiler support the generation of corresponding intrinsics code at compilation time:

#if defined __SSE2__ || defined _M_X64 || (defined _M_IX86_FP && _M_IX86_FP >= 2)
#include <emmintrin.h>
#define VISP_HAVE_SSE2 1

#if defined __SSE3__ || (defined _MSC_VER && _MSC_VER >= 1500)
#include <pmmintrin.h>
#define VISP_HAVE_SSE3 1
#endif
#if defined __SSSE3__ || (defined _MSC_VER && _MSC_VER >= 1500)
#include <tmmintrin.h>
#define VISP_HAVE_SSSE3 1
#endif
#endif

use CMake options to enable SSE2 / SSE3 / SSSE3, this will add the necessary flags (e.g. -msse2)
use vpCPUFeatures::checkSSE2() to check if the CPU support SSE2 instructions set at run time
this is necessary to avoid issue when for example ViSP is built with SSSE3 support but is run on a computer that does not support SSSE3

AVX2 has been added since Haswell architecture (2013). Correct way to support AVX2, AVX512, ... would be:

SSE and AVX2 code must be separated into separate compilation units
source files that contain SSE code will be compiled with only SSE flags (e.g. msse2) and source files that contain AVX2 code with AVX2 flag (e.g. -mavx2 or /arch:AVX2 for MSVC), see CPU dispatcher topics
when packaging ViSP for Linux distributions, the best is to have (see also):
- one option to enable baseline intrinsics (e.g. SSE2 or SSE3), regular and files that contain SSE code will have the SSE flags added
- one option to add dispatched intrinsics (e.g. AVX2, AVX512, ...), source files that contain AVX2 code will have the -mavx2 flag added
- this way, we assume that we target at minimum SSE2 or SSE3 cpus, source files with no intrinsics code will also be compiled with -msse2 or -msse3 flags (so the compiler may be able to generate SSE code even if no SSE intrinsics code are written, see for instance this example with -03 or -march=native compiler flags)
- users with recent cpu will be able to benefit from code written with AVX2 intrinsics
some warnings with SSE-AVX transition penalty

Some additional references:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notes on SIMD programming

Clone this wiki locally