SYCL_EXTERNAL + GENERIC #1242

Kiguli · 2023-11-13T15:30:27Z

Kiguli
Nov 13, 2023

Hello,

So I have been working on a project that I want to be able to run in parallel. It's been working nicely with CPU but when I ran it on GPU for the first time I got the following compilation error:

acpp robot.cpp ../../../classes/X.cpp ../../../classes/Y.cpp -O3 --acpp-targets="cuda:sm_70;omp" -lnlopt -lm -I/usr/include/hdf5/serial -L/usr/lib/x86_64-linux-gnu/hdf5/serial -lhdf5 -lglpk -lgsl -lgslcblas -DH5_USE_110_API -larmadillo -o robot
clang: warning: CUDA version is newer than the latest supported version 11.5 [-Wunknown-cuda-version]
ptxas fatal : Unresolved extern function 'strlen'
clang: error: ptxas command failed with exit code 255 (use -v to see invocation)
Debian clang version 14.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm-14/bin
clang: note: diagnostic msg:

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /tmp/robot2D-9253eb.cu
clang: note: diagnostic msg: /tmp/robot2D-d2ff2f/robot2D-sm_70.cu
clang: note: diagnostic msg: /tmp/IMDP-8fbc9c.cu
clang: note: diagnostic msg: /tmp/IMDP-d45238/IMDP-sm_70.cu
clang: note: diagnostic msg: /tmp/MDP-378841.cu
clang: note: diagnostic msg: /tmp/MDP-171347/MDP-sm_70.cu
clang: note: diagnostic msg: /tmp/robot2D-9253eb.sh
clang: note: diagnostic msg:

make: *** [Makefile:13: robot2D] Error 255

Having been reading around I understand this is to do with linking and a commonish issue with GPU code. Could I get some advice on how to go about adapting my project for GPU?

The structure of my project is some objects with .h and .cpp files where some functions then do computations in parallel by starting a sycl queue and storing the results in the object variables. I understand one way could be using the extern SYCL_EXTERNAL for the functions in the header files of the objects I create. But then I would need to compile using the acpp-targets=generic and how would I then run this compiled file over my NVIDIA GPU? I know I can't use acpp-targets=cuda:sm_70 as I would like from reading some of the old issues mentioned on this GitHub.

A few pointers would be greatly appreciated!

P.S. strlen as far as I can tell only appears as one function in the Armadillo library and I don't believe I even use it.

Answered by illuhad

Nov 13, 2023

Having been reading around I understand this is to do with linking and a commonish issue with GPU code. Could I get some advice on how to go about adapting my project for GPU?

The issue is because something in your device code calls strlen() which is undefined on GPU.

strlen as far as I can tell only appears as one function in the Armadillo library and I don't believe I even use it.

The error means that strlen() is encountered somewhere in device code. If you don't use it, it might be a bug in Armadillo. Perhaps they try to do something special if they detect CUDA support from the compiler or similar.
I would recommend looking at the device code IR to understand where this is coming f…

View full answer

illuhad · 2023-11-13T16:24:03Z

illuhad
Nov 13, 2023
Maintainer

Having been reading around I understand this is to do with linking and a commonish issue with GPU code. Could I get some advice on how to go about adapting my project for GPU?

The issue is because something in your device code calls strlen() which is undefined on GPU.

strlen as far as I can tell only appears as one function in the Armadillo library and I don't believe I even use it.

The error means that strlen() is encountered somewhere in device code. If you don't use it, it might be a bug in Armadillo. Perhaps they try to do something special if they detect CUDA support from the compiler or similar.
I would recommend looking at the device code IR to understand where this is coming from. (-emit-llvm and/or -S are your friends).

I understand one way could be using the extern SYCL_EXTERNAL for the functions in the header files of the objects I create. But then I would need to compile using the acpp-targets=generic and how would I then run this compiled file over my NVIDIA GPU? I know I can't use acpp-targets=cuda:sm_70 as I would like from reading some of the old issues mentioned on this GitHub.

SYCL_EXTERNAL allows you to reference functions in your kernels that are defined in separate translation units (i.e. .cpp files). To use it, the function must be marked as SYCL_EXTERNAL:

// file1.cpp
SYCL_EXTERNAL int f();

int main() {
  sycl::queue q;
  q.single_task([](){ f(); });
  q.wait();
}

// file2.cpp
SYCL_EXTERNAL int f() {
  return 2;
}

SYCL_EXTERNAL is supported in omp and generic. So you would just --acpp-targets="omp;generic". That's all you need to change. If your program then constructs a sycl::queue on an NVIDIA GPU and submits kernels to it, it will automatically JIT compile the generic code for your NVIDIA GPU behind the scenes.

1 reply

Kiguli Nov 13, 2023
Author

Thanks for the helpful pointers 😃

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SYCL_EXTERNAL + GENERIC #1242

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

SYCL_EXTERNAL + GENERIC #1242

Kiguli Nov 13, 2023

Replies: 1 comment · 1 reply

illuhad Nov 13, 2023 Maintainer

Kiguli Nov 13, 2023 Author

Kiguli
Nov 13, 2023

Replies: 1 comment 1 reply

illuhad
Nov 13, 2023
Maintainer

Kiguli Nov 13, 2023
Author