Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing Kokkos::HIP Unit Tests #6968

Open
pvelesko opened this issue Apr 28, 2024 · 5 comments
Open

Failing Kokkos::HIP Unit Tests #6968

pvelesko opened this issue Apr 28, 2024 · 5 comments
Assignees
Labels
Question For Kokkos internal and external contributors and users

Comments

@pvelesko
Copy link

Using Kokkos:: tag 4.3.00 the following unit tests fail

85% tests passed, 7 tests failed out of 47

Total Test time (real) = 137.74 sec

The following tests FAILED:
	  4 - Kokkos_CoreUnitTest_HIP (Subprocess aborted)
	 20 - Kokkos_IncrementalTest_HIP (Failed)
	 25 - Kokkos_ContainersUnitTest_HIP (Failed)
	 26 - Kokkos_ContainersPerformanceTest_HIP (Subprocess aborted)
	 27 - Kokkos_UnitTest_Sort (Subprocess aborted)
	 32 - Kokkos_AlgorithmsUnitTest_StdSet_D (Subprocess aborted)
	 33 - Kokkos_AlgorithmsUnitTest_StdSet_E (Failed)

Please include the following for a minimal reproducer

  1. Compilers (with versions)
╭─pvelesko@cupcake ~/kokkos-build/kokkos/build_hip ‹4.3.00●›
╰─$ hipcc -v                                                                                                        130 ↵
AMD clang version 17.0.0 (https://github.com/RadeonOpenCompute/llvm-project roc-6.1.0 24103 7db7f5e49612030319346f900c08f474b1f9023a)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-6.1.0/llvm/bin
Configuration file: /opt/rocm-6.1.0/lib/llvm/bin/clang++.cfg
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/11
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12
Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/12
Candidate multilib: .;@m64
Selected multilib: .;@m64
Found HIP installation: /opt/rocm-6.1.0/lib/llvm/bin/../../.., version 6.1.40091
  1. Kokkos release or commit used (i.e., the sha1 number)
git checkout 4.3.00
  1. Platform, architecture and backend
0a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 [Radeon VII] (rev c1)

Linux cupcake 6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr  4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  32
  On-line CPU(s) list:   0-31
Vendor ID:               GenuineIntel
  Model name:            13th Gen Intel(R) Core(TM) i9-13900K
    CPU family:          6
    Model:               183
    Thread(s) per core:  2
    Core(s) per socket:  24
    Socket(s):           1
    Stepping:            1
    CPU max MHz:         3000.0000
    CPU min MHz:         800.0000
    BogoMIPS:            5990.40
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr
                          sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_go
                         od nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl
                          vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_tim
                         er aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enha
                         nced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
                          rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect
                          avx_vnni dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi vnmi umip pku
                          ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir64b fsrm md_clear serialize pconfig a
                         rch_lbr ibt flush_l1d arch_capabilities
Virtualization features:
  Virtualization:        VT-x
Caches (sum of all):
  L1d:                   896 KiB (24 instances)
  L1i:                   1.3 MiB (24 instances)
  L2:                    32 MiB (12 instances)
  L3:                    36 MiB (1 instance)
NUMA:
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-31
Vulnerabilities:
  Gather data sampling:  Not affected
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec rstack overflow:  Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
  Srbds:                 Not affected
  Tsx async abort:       Not affected
  1. CMake configure command
export KOKKOS_DIR=~/kokkos-build/kokkos
export KOKKOS_KERNELS_DIR=~/kokkos-build/kokkos-kernels
export KOKKOS_VER=4.3.00
export HIP_VER=6.1.0
export PREFIX=/space/pvelesko/install/kokkos/${KOKKOS_VER}/hipamd/$HIP_VER
module load HIP/amd/${HIP_VER}

rm -rf ${KOKKOS_DIR}/build_hip && mkdir -p ${KOKKOS_DIR}/build_hip && cd ${KOKKOS_DIR}/build_hip && rm -f CMakeCache.txt
git checkout HEAD -f && git checkout ${KOKKOS_VER}
cmake \
-DKokkos_ENABLE_HIP=ON \
-DCMAKE_CXX_COMPILER=hipcc \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DKokkos_ENABLE_TESTS=ON \
-DCMAKE_INSTALL_PREFIX=${PREFIX} ..
ninja install

  1. Output from CMake configure command
╭─pvelesko@cupcake ~/kokkos-build/kokkos/build_hip ‹4.3.00●›
╰─$ export KOKKOS_DIR=~/kokkos-build/kokkos                                                                         130 ↵
export KOKKOS_KERNELS_DIR=~/kokkos-build/kokkos-kernels
export KOKKOS_VER=4.3.00
export HIP_VER=6.1.0
export PREFIX=/space/pvelesko/install/kokkos/${KOKKOS_VER}/hipamd/$HIP_VER
module load HIP/amd/${HIP_VER}

rm -rf ${KOKKOS_DIR}/build_hip && mkdir -p ${KOKKOS_DIR}/build_hip && cd ${KOKKOS_DIR}/build_hip && rm -f CMakeCache.txt
git checkout HEAD -f && git checkout ${KOKKOS_VER}
cmake \
-DKokkos_ENABLE_HIP=ON \
-DCMAKE_CXX_COMPILER=hipcc \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DKokkos_ENABLE_TESTS=ON \
-DCMAKE_INSTALL_PREFIX=${PREFIX} ..

HEAD is now at 486cc745c Merge pull request #6908 from ndellingwood/master-release-4.3.00
-- Setting default Kokkos CXX standard to 17
-- The CXX compiler identification is Clang 17.0.0

-- Detecting CXX compiler ABI info



-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rocm/bin/hipcc - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Kokkos version: 4.3.0
-- The project name is: Kokkos
-- Using internal gtest for testing
-- Configured git information in /home/pvelesko/kokkos-build/kokkos/build_hip/generated/Kokkos_Version_Info.cpp
-- Compiler Version: 6.1.40091
-- SERIAL backend is being turned on to ensure there is at least one Host space. To change this, you must enable another host execution space and configure with -DKokkos_ENABLE_SERIAL=OFF or change CMakeCache.txt
-- Using -std=gnu++17 for C++17 extensions as feature
-- Setting Kokkos_ARCH_VEGA906=ON
-- Built-in Execution Spaces:
--     Device Parallel: Kokkos::HIP
--     Host Parallel: NoTypeDefined
--       Host Serial: SERIAL
--
-- Architectures:
--  VEGA906
-- Found TPLLIBDL: /usr/include
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success
-- Using internal desul_atomics copy
-- Found Python3: /usr/bin/python3.10 (found version "3.10.12") found components: Interpreter
-- Sources TestHIP.cpp
-- Kokkos Backends: SERIAL;HIP
-- Configuring done
-- Generating done
-- Build files have been written to: /home/pvelesko/kokkos-build/kokkos/build_hip
  1. Minimum, complete code needed to reproduce the bug
    run tests
  2. Command line needed to reproduce the bug
    ctest
  3. KokkosCore_config.h header file (generated during the build)
╭─pvelesko@cupcake ~/kokkos-build/kokkos/build_hip ‹4.3.00●›
╰─$ cat ./KokkosCore_config.h

#if !defined(KOKKOS_MACROS_HPP) || defined(KOKKOS_CORE_CONFIG_H)
#error \
    "Do not include KokkosCore_config.h directly; include Kokkos_Macros.hpp instead."
#else
#define KOKKOS_CORE_CONFIG_H
#endif

// KOKKOS_VERSION % 100 is the patch level
// KOKKOS_VERSION / 100 % 100 is the minor version
// KOKKOS_VERSION / 10000 is the major version
#define KOKKOS_VERSION 40300
#define KOKKOS_VERSION_MAJOR 4
#define KOKKOS_VERSION_MINOR 3
#define KOKKOS_VERSION_PATCH 0

/* Execution Spaces */
#define KOKKOS_ENABLE_SERIAL
/* #undef KOKKOS_ENABLE_OPENMP */
/* #undef KOKKOS_ENABLE_OPENACC */
/* #undef KOKKOS_ENABLE_OPENMPTARGET */
/* #undef KOKKOS_ENABLE_THREADS */
/* #undef KOKKOS_ENABLE_CUDA */
#define KOKKOS_ENABLE_HIP
/* #undef KOKKOS_ENABLE_HPX */
/* #undef KOKKOS_ENABLE_SYCL */
/* #undef KOKKOS_IMPL_SYCL_DEVICE_GLOBAL_SUPPORTED */

/* General Settings */
#define KOKKOS_ENABLE_CXX17
/* #undef KOKKOS_ENABLE_CXX20 */
/* #undef KOKKOS_ENABLE_CXX23 */
/* #undef KOKKOS_ENABLE_CXX26 */

/* #undef KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE */
/* #undef KOKKOS_ENABLE_CUDA_UVM */
/* #undef KOKKOS_ENABLE_CUDA_LAMBDA */
/* #undef KOKKOS_ENABLE_CUDA_CONSTEXPR */
#define KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC
/* #undef KOKKOS_ENABLE_HIP_RELOCATABLE_DEVICE_CODE */
/* #undef KOKKOS_ENABLE_HIP_MULTIPLE_KERNEL_INSTANTIATIONS */
/* #undef KOKKOS_ENABLE_IMPL_HPX_ASYNC_DISPATCH */
/* #undef KOKKOS_ENABLE_DEBUG */
/* #undef KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK */
/* #undef KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK */
/* #undef KOKKOS_ENABLE_TUNING */
#define KOKKOS_ENABLE_DEPRECATED_CODE_4
#define KOKKOS_ENABLE_DEPRECATION_WARNINGS
/* #undef KOKKOS_ENABLE_LARGE_MEM_TESTS */
#define KOKKOS_ENABLE_COMPLEX_ALIGN
/* #undef KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION */
/* #undef KOKKOS_ENABLE_AGGRESSIVE_VECTORIZATION */
/* #undef KOKKOS_ENABLE_IMPL_MDSPAN */
/* #undef KOKKOS_ENABLE_ATOMICS_BYPASS */

/* TPL Settings */
/* #undef KOKKOS_ENABLE_HWLOC */
#define KOKKOS_ENABLE_LIBDL
/* #undef KOKKOS_ENABLE_LIBQUADMATH */
/* #undef KOKKOS_ENABLE_ONEDPL */
#define KOKKOS_ENABLE_ROCTHRUST

/* #undef KOKKOS_ARCH_ARMV80 */
/* #undef KOKKOS_ARCH_ARMV8_THUNDERX */
/* #undef KOKKOS_ARCH_ARMV81 */
/* #undef KOKKOS_ARCH_ARMV8_THUNDERX2 */
/* #undef KOKKOS_ARCH_A64FX */
/* #undef KOKKOS_ARCH_AVX */
/* #undef KOKKOS_ARCH_AVX2 */
/* #undef KOKKOS_ARCH_AVX512XEON */
/* #undef KOKKOS_ARCH_ARM_NEON */
/* #undef KOKKOS_ARCH_KNC */
/* #undef KOKKOS_ARCH_AVX512MIC */
/* #undef KOKKOS_ARCH_POWER7 */
/* #undef KOKKOS_ARCH_POWER8 */
/* #undef KOKKOS_ARCH_POWER9 */
/* #undef KOKKOS_ARCH_RISCV_SG2042 */
/* #undef KOKKOS_ARCH_INTEL_GEN */
/* #undef KOKKOS_ARCH_INTEL_DG1 */
/* #undef KOKKOS_ARCH_INTEL_GEN9 */
/* #undef KOKKOS_ARCH_INTEL_GEN11 */
/* #undef KOKKOS_ARCH_INTEL_GEN12LP */
/* #undef KOKKOS_ARCH_INTEL_XEHP */
/* #undef KOKKOS_ARCH_INTEL_PVC */
/* #undef KOKKOS_ARCH_INTEL_GPU */
/* #undef KOKKOS_ARCH_KEPLER */
/* #undef KOKKOS_ARCH_KEPLER30 */
/* #undef KOKKOS_ARCH_KEPLER32 */
/* #undef KOKKOS_ARCH_KEPLER35 */
/* #undef KOKKOS_ARCH_KEPLER37 */
/* #undef KOKKOS_ARCH_MAXWELL */
/* #undef KOKKOS_ARCH_MAXWELL50 */
/* #undef KOKKOS_ARCH_MAXWELL52 */
/* #undef KOKKOS_ARCH_MAXWELL53 */
/* #undef KOKKOS_ARCH_PASCAL */
/* #undef KOKKOS_ARCH_PASCAL60 */
/* #undef KOKKOS_ARCH_PASCAL61 */
/* #undef KOKKOS_ARCH_VOLTA */
/* #undef KOKKOS_ARCH_VOLTA70 */
/* #undef KOKKOS_ARCH_VOLTA72 */
/* #undef KOKKOS_ARCH_TURING75 */
/* #undef KOKKOS_ARCH_AMPERE */
/* #undef KOKKOS_ARCH_AMPERE80 */
/* #undef KOKKOS_ARCH_AMPERE86 */
/* #undef KOKKOS_ARCH_ADA89 */
/* #undef KOKKOS_ARCH_HOPPER */
/* #undef KOKKOS_ARCH_HOPPER90 */
/* #undef KOKKOS_ARCH_AMD_ZEN */
/* #undef KOKKOS_ARCH_AMD_ZEN2 */
/* #undef KOKKOS_ARCH_AMD_ZEN3 */
#define KOKKOS_ARCH_AMD_GFX906
/* #undef KOKKOS_ARCH_AMD_GFX908 */
/* #undef KOKKOS_ARCH_AMD_GFX90A */
/* #undef KOKKOS_ARCH_AMD_GFX940 */
/* #undef KOKKOS_ARCH_AMD_GFX942 */
/* #undef KOKKOS_ARCH_AMD_GFX1030 */
/* #undef KOKKOS_ARCH_AMD_GFX1100 */
#define KOKKOS_ARCH_AMD_GPU
#define KOKKOS_ARCH_VEGA // deprecated
#define KOKKOS_ARCH_VEGA906 // deprecated
/* #undef KOKKOS_ARCH_VEGA908 */
/* #undef KOKKOS_ARCH_VEGA90A */
/* #undef KOKKOS_ARCH_NAVI */
/* #undef KOKKOS_ARCH_NAVI1030 */
/* #undef KOKKOS_ARCH_NAVI1100 */

/* #undef KOKKOS_IMPL_32BIT */
  1. Please provide any additional relevant error logs
@masterleinad
Copy link
Contributor

How do these unit tests fail for you? What do they print to the console if you run them individually?

@ajpowelsnl ajpowelsnl added the Question For Kokkos internal and external contributors and users label Apr 29, 2024
@pvelesko
Copy link
Author

@pvelesko
Copy link
Author

pvelesko commented May 13, 2024

@masterleinad @Rombur any debug steps I should take?

@Rombur
Copy link
Member

Rombur commented May 13, 2024

@pvelesko It's not obvious what's the problem and I don't have access to these old GPUs anymore. You could try an older version of Kokkos (maybe 4.0) and see if that helps.

@pvelesko
Copy link
Author

pvelesko commented May 14, 2024

@Rombur

Using HIPAMD 5.6.0 and Kokkos 4.0.00. I have a dependency on Kokkos 4.3.00

79% tests passed, 8 tests failed out of 38

Total Test time (real) = 188.40 sec

The following tests FAILED:
	  4 - KokkosCore_UnitTest_HIP (Subprocess aborted)
	 20 - KokkosCore_IncrementalTest_HIP (Failed)
	 27 - KokkosCore_PerformanceTest_Mempool (Subprocess aborted)
	 30 - KokkosContainers_UnitTest_HIP (Failed)
	 31 - KokkosContainers_PerformanceTest_HIP (Subprocess aborted)
	 32 - KokkosAlgorithms_UnitTest_RandomAndSort (Subprocess aborted)
	 36 - KokkosAlgorithms_UnitTest_StdSet_D (Subprocess aborted)
	 37 - KokkosAlgorithms_UnitTest_StdSet_E (Failed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Question For Kokkos internal and external contributors and users
Projects
None yet
Development

No branches or pull requests

4 participants