Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nightly Sycl unit test failures with intel/2023.1.0, intel/2024.1.0 on Intel Ponte Vecchio #1961

Open
ndellingwood opened this issue Aug 31, 2023 · 2 comments

Comments

@ndellingwood
Copy link
Contributor

ndellingwood commented Aug 31, 2023

Testing with the Sycl backend on Intel Ponte Vecchio on the new Blake showed a couple failing sub-tests (failure output listed below the failing executable), depending on which environment variables set:

Default (ZES_ENABLE_SYSMAN unset)

The following tests FAILED:
   13 - sparse_sycl (Failed)
[  FAILED  ] sycl_test.sparse_coo2crs
[  FAILED  ] sycl_test.sparse_spgemm_jacobi_double_int_size_t_TestExecSpace
[  FAILED  ] sycl_test.sparse_spgemm_double_int_size_t_TestExecSpace
[  FAILED  ] sycl_test.sparse_par_ilut_double_int_size_t_TestExecSpace
[  FAILED  ] sycl_test.sparse_par_ilut_precond_double_int_size_t_TestExecSpace

   14 - blocksparse_sycl (Failed)
[  FAILED  ] sycl_test.sparse_bsr_gauss_seidel_rank1_double_int_size_t_TestExecSpace
[  FAILED  ] sycl_test.sparse_bsr_gauss_seidel_rank2_double_int_size_t_TestExecSpace
[  FAILED  ] sycl_test.sparse_block_spgemm_double_int_size_t_TestExecSpace

   22 - wiki_spgemm (Subprocess aborted)
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error: No memory modules for the SYCL backend found. Make sure that ZES_ENABLE_SYSMAN=1 is set at run time!

ZES_ENABLE_SYSMAN=1

The following tests FAILED:
13/27 Test #13: sparse_sycl ......................Subprocess aborted***Exception:  45.76 sec
[==========] Running 48 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 48 tests from sycl_test
[ RUN      ] sycl_test.sparse_coo2crs
/home/ndellin/kokkos-kernels/test_common/KokkosKernels_TestUtils.hpp:159: Failure
Expected: ((double)AT1::abs(val1 - val2)) <= ((double)AT3::abs(tol)), actual: 4.34205 vs 3.75255e-10
row: 17, crs_col_ids_ref(504) = 20 mismatched values!
Begin arguments for above failure...
RandCooMat<N6Kokkos7complexIdEE, N6Kokkos10LayoutLeftE, N6Kokkos12Experimental4SYCLE130...): rand seed: 3072659895
scalar: N6Kokkos7complexIdEE
layout: N6Kokkos10LayoutLeftE
m: 130, n: 130
...end arguments for above failure.
...
[  FAILED  ] sycl_test.sparse_coo2crs (20842 ms)
[ RUN      ] sycl_test.sparse_spgemm_jacobi_double_int_size_t_TestExecSpace
terminate called after throwing an instance of 'std::runtime_error'
  what():  There was a synchronous SYCL error:
Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)

14/27 Test #14: blocksparse_sycl .................***Failed   33.50 sec
...
[ RUN      ] sycl_test.sparse_block_spgemm_double_int_size_t_TestExecSpace
nentries_actual:1564 nentries_reference:2423
/home/ndellin/kokkos-kernels/sparse/unit_test/Test_Sparse_bspgemm.hpp:235: Failure
Value of: is_identical
  Actual: false
Expected: true
SPGEMM_KK
...

Reproducer (Blake PV queue):
SHAs:
kokkos/kokkos@7e299b4
acdd896

module load cmake intel-oneapi-compilers/2023.1.0

$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-sycl --arch=INTEL_PVC --compiler=/projects/x86-64-icelake-rocky8/compilers/intel-oneapi-compilers/2023.1.0/gcc/8.5.0/base/6g2jkiv/compiler/2023.1.0/linux/bin-llvm/clang++ --cxxflags="-fp-model=precise" --shared --kokkos-cmake-flags=-DKokkos_ENABLE_ONEDPL=OFF

Edit: Added shas used in the testing

@ndellingwood ndellingwood changed the title Sycl unit test failures with intel/2023.1.0 on Intel Ponte Vecchio Nightly Sycl unit test failures with intel/2023.1.0 on Intel Ponte Vecchio Nov 14, 2023
@ndellingwood
Copy link
Contributor Author

Updating the issue with failures as of SHA 32aa75a

Configuration 1 (no TPLs):

salloc -N 1 -p PV

source /projects/x86-64-icelake-rocky8/spack-config/blake-setup-user-module-env.sh
module purge
module load cmake intel-oneapi-compilers/2023.1.0 intel-oneapi-dpl/2022.1.0 git

# Required for the hashmap accumulator
export ZES_ENABLE_SYSMAN=1

# Configuration
$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-sycl --arch=INTEL_PVC --compiler=/projects/x86-64-icelake-rocky8/compilers/intel-oneapi-compilers/2023.1.0/gcc/8.5.0/base/6g2jkiv/compiler/2023.1.0/linux/bin-llvm/clang++ --cxxflags="-fp-model=precise" --shared --kokkos-cmake-flags=-DKokkos_ENABLE_ONEDPL=OFF -kokkos-path=$KOKKOS_PATH

Test failures on PVC:

23:43:24 The following tests FAILED:
23:43:24 	 15 - sparse_sycl (SEGFAULT)
23:43:24 	 16 - blocksparse_sycl (Failed)

Configuration 2 (oneMKL):

salloc -N 1 -p PV

source /projects/x86-64-icelake-rocky8/spack-config/blake-setup-user-module-env.sh
module purge
module load git cmake intel-oneapi-compilers/2023.1.0 intel-oneapi-dpl/2022.1.0 intel-oneapi-mkl/2023.1.0 intel-oneapi-tbb/2021.9.0

# Required for the hashmap accumulator
export ZES_ENABLE_SYSMAN=1

# Configuration
$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-sycl --arch=INTEL_PVC --compiler=icpx --cxxflags="-fp-model=precise" --shared --with-tpls=mkl --kokkos-cmake-flags=-DKokkos_ENABLE_ONEDPL=OFF -kokkos-path=$KOKKOS_PATH

Test failures on PVC:

05:49:17 The following tests FAILED:
05:49:17 	  9 - blas_sycl (Failed)
05:49:17 	 15 - sparse_sycl (Subprocess aborted)
05:49:17 	 16 - blocksparse_sycl (Failed)
05:49:17 	 26 - wiki_spadd (Subprocess aborted)

@ndellingwood
Copy link
Contributor Author

Joe installed intel oneapi 2024.1.0 on Blake, I tested the MKL configuration above:

Test failures:

15/32 Test #15: sparse_sycl ......................***Failed  194.78 sec
...
[  PASSED  ] 47 tests.
[  FAILED  ] 4 tests, listed below:
[  FAILED  ] sycl_test.sparse_spgemm_jacobi_double_int_int_TestDevice
[  FAILED  ] sycl_test.sparse_spgemm_double_int_int_TestDevice
[  FAILED  ] sycl_test.sparse_spmv_double_int_int_TestDevice
[  FAILED  ] sycl_test.sparse_par_ilut_double_int_int_TestDevice

16/32 Test #16: blocksparse_sycl .................***Failed   29.87 sec
...
[==========] 7 tests from 1 test case ran. (29406 ms total)
[  PASSED  ] 6 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] sycl_test.sparse_block_spgemm_double_int_int_TestDevice

Configuration (Sycl backend, intel/2024.1.0 with mkl/2024.0.0):

source /projects/x86-64-icelake-rocky8/spack-config/blake-setup-user-module-env.sh
module purge
module load cmake intel-oneapi-compilers/2024.1.0 intel-oneapi-dpl/2022.5.0 intel-oneapi-tbb/2021.12.0 intel-oneapi-mkl/2024.0.0
module list

# Required for the hashmap accumulator
export ZES_ENABLE_SYSMAN=1

# Configuration
$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-sycl --arch=INTEL_PVC --compiler=icpx --cxxflags="-fp-model=precise -Wno-pass-failed" --shared --with-tpls=mkl --kokkos-path=$KOKKOS_PATH

make -j16

# Unit tests
export ONEAPI_DEVICE_SELECTOR=ext_oneapi_level_zero:gpu
ctest --output-on-failure

@ndellingwood ndellingwood changed the title Nightly Sycl unit test failures with intel/2023.1.0 on Intel Ponte Vecchio Nightly Sycl unit test failures with intel/2023.1.0, intel/2024.1.0 on Intel Ponte Vecchio Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant