Replies: 7 comments 13 replies
-
Hi @csccva, AdaptiveCpp has two CPU modes for Last time I checked, DPC++ still used Intel OpenCL CPU runtime under the hood which does similar optimizations to our compiler-supported flow but on top employs its own whole-function vectorizer, which we cannot ship similarly. In theory, our approach is compatible with LLVM's VPlan native path that employs an outer-loop vectorizer or the out-of-tree Region Vectorizer (RV). You could test if this is the deciding factor by forcing the Intel runs to not be vectorized by setting Additionally, in my experience, Intel's OpenCL CPU runtime does tend to compile the kernels with more aggressive options, such as fast-math. But I don't remember how to get the runtime to tell you what its current compile-time options are... |
Beta Was this translation helpful? Give feedback.
-
Hello, I checked my
Regarding the installation I am not sure, I installed AdaptiveCpp using Regarding the oneAPI I set
The kernel execution time is now similar to hipsycl.Your suspicion about vectorization must be correct. Cristian |
Beta Was this translation helpful? Give feedback.
-
I changed the spack recipe and added |
Beta Was this translation helpful? Give feedback.
-
Some additional pointers:
Note that AdaptiveCpp can also use the exact same Intel OpenCL CPU runtime that DPC++ uses for CPU execution. For this, you need to build AdaptiveCpp with the OpenCL backend enabled. In practice, it is almost impossible that DPC++ outperforms AdaptiveCpp on CPU since AdaptiveCpp can use the exact same paths and more, so you have more options to try. EDIT: Your code does not contain barriers. It's unlikely that accelerated CPU affects performance. |
Beta Was this translation helpful? Give feedback.
-
I did the installation via spack. I tried different things. The last one was When I compile the code I get this:
There are some warnings. I assumed that they refer to the inner loop over I checked the libraries. The libomp is the correct one:
I will try on LUMI tomorrow. Cristian |
Beta Was this translation helpful? Give feedback.
-
This is with current AdaptiveCpp on 2x AMD Epyc 7713, which is probably similar to your hardware. I cannot reproduce your performance observation:
|
Beta Was this translation helpful? Give feedback.
-
Depending on the exact system topology it might also be feasible to first zero the buffers with a
host pointer here means passing a pointer to the buffer constructor |
Beta Was this translation helpful? Give feedback.
-
Hello,
I am testing some sycl codes on a machine with AMD EPYC 7H12 64-Core Processor and nvidia gpus. When using the nvidia gpu AdaptiveCpp and oneAPI give very similar results:
and
But when I try to run the code using a cpu core there is significant difference:
vs.
I compile the code using:
Below is the code:
Is there anyway to improve the AdaptiveCpp cpu performance?
Beta Was this translation helpful? Give feedback.
All reactions