Skip to content

Notes on SIMD programming

s-trinh edited this page Jan 17, 2019 · 1 revision

Current state on intrinsics code in ViSP:

  • only x86 SSE (no AVX, AVX2, ARM NEON, ...)
  • SSE headers must be included in .cpp file to detect if the compiler support the generation of corresponding intrinsics code at compilation time:
#if defined __SSE2__ || defined _M_X64 || (defined _M_IX86_FP && _M_IX86_FP >= 2)
#include <emmintrin.h>
#define VISP_HAVE_SSE2 1

#if defined __SSE3__ || (defined _MSC_VER && _MSC_VER >= 1500)
#include <pmmintrin.h>
#define VISP_HAVE_SSE3 1
#endif
#if defined __SSSE3__ || (defined _MSC_VER && _MSC_VER >= 1500)
#include <tmmintrin.h>
#define VISP_HAVE_SSSE3 1
#endif
#endif
  • use CMake options to enable SSE2 / SSE3 / SSSE3, this will add the necessary flags (e.g. -msse2)
  • use vpCPUFeatures::checkSSE2() to check if the CPU support SSE2 instructions set at run time
  • this is necessary to avoid issue when for example ViSP is built with SSSE3 support but is run on a computer that does not support SSSE3

AVX2 has been added since Haswell architecture (2013). Correct way to support AVX2, AVX512, ... would be:

  • SSE and AVX2 code must be separated into separate compilation units
  • source files that contain SSE code will be compiled with only SSE flags (e.g. msse2) and source files that contain AVX2 code with AVX2 flag (e.g. -mavx2 or /arch:AVX2 for MSVC), see CPU dispatcher topics
  • when packaging ViSP for Linux distributions, the best is to have (see also):
    • one option to enable baseline intrinsics (e.g. SSE2 or SSE3), regular and files that contain SSE code will have the SSE flags added
    • one option to add dispatched intrinsics (e.g. AVX2, AVX512, ...), source files that contain AVX2 code will have the -mavx2 flag added
    • this way, we assume that we target at minimum SSE2 or SSE3 cpus, source files with no intrinsics code will also be compiled with -msse2 or -msse3 flags (so the compiler may be able to generate SSE code even if no SSE intrinsics code are written, see for instance this example with -03 or -march=native compiler flags)
    • users with recent cpu will be able to benefit from code written with AVX2 intrinsics
  • some warnings with SSE-AVX transition penalty

Some additional references: