Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to run kernels with the JIT compiler #1432

Open
AuroraPerego opened this issue Apr 7, 2024 · 4 comments
Open

Fail to run kernels with the JIT compiler #1432

AuroraPerego opened this issue Apr 7, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@AuroraPerego
Copy link

Bug summary
A simple application compiles but fails to run with the JIT compiler. AOT compilation with CUDA results in a working executable.
I'm afraid I've missed something when building acpp, but I can't figure out what's wrong.
The CPU through OpenCL works, while the CUDA backend, the Intel GPU (with both OpenCL and Level Zero), and the host device fail.

CUDA output:

Running on Tesla T4
[AdaptiveCpp Error] from /data/user/aperego/AdaptiveCpp/src/runtime/cuda/cuda_code_object.cpp:96 @ build_cuda_module_from_ptx(): cuda_executable_object: could not load module (error code = CU:218)
[AdaptiveCpp Error] from /data/user/aperego/AdaptiveCpp/src/runtime/cuda/cuda_queue.cpp:692 @ submit_sscp_kernel_from_code_object(): cuda_queue: Code object construction failed
============== hipSYCL error report ==============
hipSYCL has caught the following undhandled asynchronous errors:

   0. from /data/user/aperego/AdaptiveCpp/src/runtime/cuda/cuda_code_object.cpp:96 @ build_cuda_module_from_ptx(): cuda_executable_object: could not load module (error code = CU:218)
   1. from /data/user/aperego/AdaptiveCpp/src/runtime/cuda/cuda_queue.cpp:692 @ submit_sscp_kernel_from_code_object(): cuda_queue: Code object construction failed
The application will now be terminated.
terminate called without an active exception
Aborted (core dumped)

Intel GPU (OpenCL):

Running on Intel(R) Data Center GPU Flex 170
[AdaptiveCpp Error] from /data/user/aperego/AdaptiveCpp/src/runtime/ocl/ocl_code_object.cpp:90 @ ocl_executable_object(): ocl_code_object: Building CL program failed. Build log: error : unresolved external symbol puts at offset 244 in instructions segment #0 (aka kernel : _Z21__hipsycl_sscp_kernelIN7hipsycl4glue15__sscp_dispatch20ndrange_parallel_forIZZ4mainENK3$_0clERNS0_4sycl7handlerEEUlNS5_7nd_itemILi1EEEE_Li1EEEEvRKT_)
error : unresolved external symbol puts at offset 356 in instructions segment #0 (aka kernel : _Z21__hipsycl_sscp_kernelIN7hipsycl4glue15__sscp_dispatch20ndrange_parallel_forIZZ4mainENK3$_0clERNS0_4sycl7handlerEEUlNS5_7nd_itemILi1EEEE_Li1EEEEvRKT_)
 (error code = CL:-42)
[AdaptiveCpp Error] from /data/user/aperego/AdaptiveCpp/src/runtime/ocl/ocl_code_object.cpp:90 @ ocl_executable_object(): ocl_code_object: Building CL program failed. Build log: error : unresolved external symbol puts at offset 244 in instructions segment #0 (aka kernel : _Z21__hipsycl_sscp_kernelIN7hipsycl4glue15__sscp_dispatch20ndrange_parallel_forIZZ4mainENK3$_0clERNS0_4sycl7handlerEEUlNS5_7nd_itemILi1EEEE_Li1EEEEvRKT_)
error : unresolved external symbol puts at offset 356 in instructions segment #0 (aka kernel : _Z21__hipsycl_sscp_kernelIN7hipsycl4glue15__sscp_dispatch20ndrange_parallel_forIZZ4mainENK3$_0clERNS0_4sycl7handlerEEUlNS5_7nd_itemILi1EEEE_Li1EEEEvRKT_)
 (error code = CL:-42)
[AdaptiveCpp Error] from /data/user/aperego/AdaptiveCpp/src/runtime/ocl/ocl_queue.cpp:513 @ submit_sscp_kernel_from_code_object(): ocl_queue: Code object construction failed
============== hipSYCL error report ==============
hipSYCL has caught the following undhandled asynchronous errors:

   0. from /data/user/aperego/AdaptiveCpp/src/runtime/ocl/ocl_code_object.cpp:90 @ ocl_executable_object(): ocl_code_object: Building CL program failed. Build log: error : unresolved external symbol puts at offset 244 in instructions segment #0 (aka kernel : _Z21__hipsycl_sscp_kernelIN7hipsycl4glue15__sscp_dispatch20ndrange_parallel_forIZZ4mainENK3$_0clERNS0_4sycl7handlerEEUlNS5_7nd_itemILi1EEEE_Li1EEEEvRKT_)
error : unresolved external symbol puts at offset 356 in instructions segment #0 (aka kernel : _Z21__hipsycl_sscp_kernelIN7hipsycl4glue15__sscp_dispatch20ndrange_parallel_forIZZ4mainENK3$_0clERNS0_4sycl7handlerEEUlNS5_7nd_itemILi1EEEE_Li1EEEEvRKT_)
 (error code = CL:-42)
   1. from /data/user/aperego/AdaptiveCpp/src/runtime/ocl/ocl_code_object.cpp:90 @ ocl_executable_object(): ocl_code_object: Building CL program failed. Build log: error : unresolved external symbol puts at offset 244 in instructions segment #0 (aka kernel : _Z21__hipsycl_sscp_kernelIN7hipsycl4glue15__sscp_dispatch20ndrange_parallel_forIZZ4mainENK3$_0clERNS0_4sycl7handlerEEUlNS5_7nd_itemILi1EEEE_Li1EEEEvRKT_)
error : unresolved external symbol puts at offset 356 in instructions segment #0 (aka kernel : _Z21__hipsycl_sscp_kernelIN7hipsycl4glue15__sscp_dispatch20ndrange_parallel_forIZZ4mainENK3$_0clERNS0_4sycl7handlerEEUlNS5_7nd_itemILi1EEEE_Li1EEEEvRKT_)
 (error code = CL:-42)
   2. from /data/user/aperego/AdaptiveCpp/src/runtime/ocl/ocl_queue.cpp:513 @ submit_sscp_kernel_from_code_object(): ocl_queue: Code object construction failed
The application will now be terminated.
terminate called without an active exception
Aborted (core dumped)

Host:

Running on hipSYCL OpenMP host device
[AdaptiveCpp Warning] [LoopSplitterInlining] puts is not defined!
/cvmfs/cms.cern.ch/el9_amd64_gcc11/external/gcc/11.4.1-30ebdc301ebd200f2ae0e3d880258e65/bin/ld: cannot find crtbeginS.o: No such file or directory
/cvmfs/cms.cern.ch/el9_amd64_gcc11/external/gcc/11.4.1-30ebdc301ebd200f2ae0e3d880258e65/bin/ld: cannot find -lstdc++: No such file or directory
clang-16: error: linker command failed with exit code 1 (use -v to see invocation)
[AdaptiveCpp Error] from /data/user/aperego/AdaptiveCpp/include/hipSYCL/glue/llvm-sscp/jit.hpp:265 @ compile(): jit::compile: Encountered errors:
0: LLVMToHost: clang invocation failed with exit code 1

[AdaptiveCpp Error] from /data/user/aperego/AdaptiveCpp/src/runtime/omp/omp_queue.cpp:494 @ submit_sscp_kernel_from_code_object(): omp_queue: Code object construction failed
============== hipSYCL error report ==============
hipSYCL has caught the following undhandled asynchronous errors:

   0. from /data/user/aperego/AdaptiveCpp/include/hipSYCL/glue/llvm-sscp/jit.hpp:265 @ compile(): jit::compile: Encountered errors:
0: LLVMToHost: clang invocation failed with exit code 1

   1. from /data/user/aperego/AdaptiveCpp/src/runtime/omp/omp_queue.cpp:494 @ submit_sscp_kernel_from_code_object(): omp_queue: Code object construction failed
The application will now be terminated.
terminate called without an active exception
Aborted (core dumped)

In this case, it cannot find crtbeginS.o and -lstdc++ (at least the first is in the path set with the --gcc-install-dir flag).
To Reproduce
test.cpp:

#include <sycl/sycl.hpp>

int main(int argc, char** argv) {
  static const std::vector<sycl::device> devices =
      sycl::device::get_devices(sycl::info::device_type::all);

  std::cerr << "Found " << devices.size() << " SYCL devices:" << std::endl;
  for (auto const &device : devices)
    std::cerr << "  - " << device.get_info<sycl::info::device::name>() << " ["
              << device.get_info<sycl::info::device::driver_version>() << "]"
              << std::endl;
  std::cerr << std::endl;
  if (argc < 2) {
    std::cout << "Provide the device number N: ./test N" << std::endl;
    return 1;
  }
  auto q = sycl::queue{devices[atoi(argv[1])]};
  std::cout << "Running on "
            << q.get_device().get_info<sycl::info::device::name>() << std::endl;

  q.submit([&](sycl::handler &cgh) {
     cgh.parallel_for(sycl::nd_range<1>(1, 1),
                      [=](sycl::nd_item<1> item) { printf("Running the kernel\n"); });
   }).wait();

  return 0;
}

compile and run:

acpp --gcc-install-dir=/cvmfs/cms.cern.ch/el9_amd64_gcc11/external/gcc/11.4.1-30ebdc301ebd200f2ae0e3d880258e65/lib/gcc/x86_64-redhat-linux-gnu/11.4.1 -std=c++17 --acpp-targets="generic" -O2  -o test test.cpp
./test N # where N is the device's position in the list of available devices

Expected behavior
It should run without errors

Describe your setup

  • AdaptiveCpp cloned from github: develop@57773099 + PR 1031 and built with:
    cmake \
     -DCMAKE_INSTALL_PREFIX=$PWD \
     -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \
     -DWITH_CUDA_BACKEND=ON \
     -DROCM_PATH=/opt/rocm \
     -DWITH_ROCM_BACKEND=ON \
     -DWITH_LEVEL_ZERO_BACKEND=ON \
     -DOpenCL_LIBRARY=/opt/intel/oneapi/compiler/2024.0/lib/libOpenCL.so \
     -DWITH_OPENCL_BACKEND=ON \
     -DLLVM_DIR=/cvmfs/cms.cern.ch/el9_amd64_gcc11/external/llvm/16.0.3-d5387186335b0dd85e1c294a1fd64dd0/lib64/cmake/llvm \
     ..
    
  • clang version 16.0.3
    Target: x86_64-redhat-linux-gnu
    Thread model: posix
    InstalledDir: /cvmfs/cms.cern.ch/el9_amd64_gcc11/external/llvm/16.0.3-d5387186335b0dd85e1c294a1fd64dd0/bin
    Found CUDA installation: /usr/local/cuda-12.2, version
    Found HIP installation: /opt/rocm, version 6.0.32831
    
  • gcc (GCC) 11.4.1 20230601

Optional additional diagnostic information

  • from syclcc --hipsycl-version I see:
    AdaptiveCpp version: 24.02.0+git.dbaf585a.20240330.branch.develop.dirty
    Installation root: /data/user/aperego/AdaptiveCpp/build
    Plugin LLVM version: 16, can accelerate CPU: True
    Available runtime backends:
       librt-backend-omp.so
       librt-backend-cuda.so
       librt-backend-ze.so
       librt-backend-hip.so
       librt-backend-ocl.so
@AuroraPerego AuroraPerego added the bug Something isn't working label Apr 7, 2024
@illuhad
Copy link
Collaborator

illuhad commented Apr 8, 2024

unresolved external symbol puts at offset 356 in instructions segment #0

This error message is quite clear: Apparently the kernel calls into puts() from the C standard library which is not supported on device. I can't say why puts ends up in kernel code without seeing your code, but my guess is that you are using cout or similar.

EDIT: Missed your code snippet. Using printf inside kernels is illegal in SYCL.

@fodinabor - probably host JIT does not take into account --gcc-install-dir given at compile time, right? Do we even still need -lstdc++? libc should be enough at JIT time, right?

@AuroraPerego
Copy link
Author

EDIT: Missed your code snippet. Using printf inside kernels is illegal in SYCL.

Thanks, without the printf everything works apart from the host device which continues to fail.
There is another related reason for which I've opened this issue which is a failure in compiling kernels with any_of_group/shift_group_[left/right] due to a JIT session error: Symbols not found.
As an example, this:

#include <sycl/sycl.hpp>

void kernel(sycl::nd_item<1> item){
  bool more = true;
  while ( (item.barrier(), sycl::any_of_group(item.get_group(), more)) ) {
    more = false;
  }
  item.barrier();
}

int main(int argc, char** argv){
    int threadsPerBlock = 64;
    int blocks = 1;
    static const std::vector<sycl::device> devices = sycl::device::get_devices(sycl::info::device_type::all);

    if (argc < 2) {
      std::cout << "Provide the device number N: ./any_of_group N" << std::endl;
      return 1;
    }
    auto stream = sycl::queue{devices[atoi(argv[1])]};
    std::cerr << "stream offload to " << stream.get_device().get_info<sycl::info::device::name>() << " ["
                  << stream.get_device().get_info<sycl::info::device::driver_version>() << "]" << std::endl;
    stream.submit([&](sycl::handler &cgh) {
    cgh.parallel_for(
      sycl::nd_range<1>( blocks * threadsPerBlock, threadsPerBlock),
      [=](sycl::nd_item<1> item) {
        kernel(item);
      });
    });
}

fails with the error (Intel GPU - OpenCL, the other backends just say that the kernel could not be constructed):

[AdaptiveCpp Error] from /data/user/aperego/AdaptiveCpp/src/runtime/ocl/ocl_code_object.cpp:90 @ ocl_executable_object(): ocl_code_object: Building CL program failed. Build log: error : unresolved external symbol _ZN7hipsycl4sycl6detail13sscp_builtins22__hipsycl_any_of_groupILi1EEEbNS0_5groupIXT_EEEb at offset 436 in instructions segment #0 (aka kernel : _Z21__hipsycl_sscp_kernelIN7hipsycl4glue15__sscp_dispatch20ndrange_parallel_forIZZ4mainENK3$_0clERNS0_4sycl7handlerEEUlNS5_7nd_itemILi1EEEE_Li1EEEEvRKT_)
error : unresolved external symbol _ZN7hipsycl4sycl6detail13sscp_builtins22__hipsycl_any_of_groupILi1EEEbNS0_5groupIXT_EEEb at offset 516 in instructions segment #0 (aka kernel : _Z21__hipsycl_sscp_kernelIN7hipsycl4glue15__sscp_dispatch20ndrange_parallel_forIZZ4mainENK3$_0clERNS0_4sycl7handlerEEUlNS5_7nd_itemILi1EEEE_Li1EEEEvRKT_)
 (error code = CL:-42)

@illuhad
Copy link
Collaborator

illuhad commented Apr 12, 2024

Thanks, without the printf everything works apart from the host device which continues to fail.
There is another related reason for which I've opened this issue which is a failure in compiling kernels with any_of_group/shift_group_[left/right] due to a JIT session error: Symbols not found.

This is expected; group algorithms other than group_barrier are not yet implemented for the generic JIT compiler.

@illuhad
Copy link
Collaborator

illuhad commented Apr 12, 2024

AdaptiveCpp assumes the existence of a correctly configured clang, e.g by using clang configuration files and building AdaptiveCpp against a wrapper clang that automatically correctly sets necessary flags as described here: https://github.com/AdaptiveCpp/AdaptiveCpp/blob/develop/doc/install-llvm.md#gcc-toolchainc-standard-library-is-in-a-non-standard-location

acpp --gcc-install-dir is probably not the correct way to handle these things because --gcc-install-dir will not be available at JIT time.

I strongly suspect that a correct clang installation with clang configuration files will resolve the remaining issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants