You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the OpenCL backend, the key to the code cache is scheduleNo.taskNo-methodName. This can cause conflicts when multiple task schedules with the same name are created in different scopes.
The OpenCL backend will not recompile for the first task t0 in the second scope and therefore use the wrong inlined array length in.length value in the kernel. The reason the second task t1 is recompiled is a side effect from here
The PTX backend will trigger 4 compilations in total.
I think the way to solve this is to pass all the task parameters (primitives and object references) through the call stack.
We also might need to stop inlining array lengths in the @Parallel annotated loops (for(;i_3 < 1024;)).
The text was updated successfully, but these errors were encountered:
jjfumero
changed the title
Pass all method parameters through the call stack and improve code cache strategy
[Proposal] Pass all method parameters through the call stack and improve code cache strategy
Apr 9, 2021
Currently, we have two different strategies for caching compilation results, one for each backend (PTX, OpenCL).
For the PTX backend, we rely on the identity of the function parameters passed to the task. The issue with this is that a recompilation will be triggered every time a parameter is changed.
For the OpenCL backend, the key to the code cache is
scheduleNo.taskNo-methodName
. This can cause conflicts when multiple task schedules with the same name are created in different scopes.For example with the code below:
The OpenCL backend will not recompile for the first task
t0
in the second scope and therefore use the wrong inlined array lengthin.length
value in the kernel. The reason the second taskt1
is recompiled is a side effect from hereThe PTX backend will trigger 4 compilations in total.
I think the way to solve this is to pass all the task parameters (primitives and object references) through the call stack.
We also might need to stop inlining array lengths in the
@Parallel
annotated loops (for(;i_3 < 1024;)
).The text was updated successfully, but these errors were encountered: