You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
record myRec {
var x :int;
}
onhere.gpus[0] {
var Source =[1,2,3,4,5];
writeln(Source);
var SourceRec =[i in Source]new myRec(i);
writeln(SourceRec);
}
I understand the reason, I think we can address it, but not quickly enough for me to handle today. I'll note what I know here:
The loop looks like the following before GPU transforms:
var initTemp: myRec;
foreach arrItem, i .... { // this will turn into a kernel
init(initTemp, i);
arrItem = initTemp; // copy semantics makes this OK
Note that the compiler moves initTemp outside of the loop sometime before GPU transforms. This may be an optimization with the assumption that the ensuing loop is serial, which has been the case historically, but not so much with foreach loops.
What happens next is that GPU transforms sees initTemp as an outer variable, for which it doesn't have any special instructions to handle (we can't handle outer records in general today, because they are in CPU memory). The course of action it takes is to offload that into GPU memory and use it as a singleton. This is the approach we take for array records, which do not change during the course of a kernel, so it is safe there.
I believe the behavior is indeterministic. All GPU threads use the same record temp for initializing, and what gets assigned to array elements can change in different configurations.
Note that this is GPU-driven-comm-adjacent, but even if we can handle any record on GPU kernels, this one's a special case. We definitely need a per-thread copy of that init temp. A stopgap measure could be to handle such things separately, where we put the DefExpr into the kernel to make it thread-private. But I don't like that solution. If we call a loop order-independent, we shouldn't be able to move that DefExpr outside of the loop in the first place. IOW, while this shows as a GPU bug today, if the code in question runs in vectorized way, I suspect we'll suffer from the same thing. So, my ideal solution would be to fix this before GPU transforms happen.
Summary of Problem
Description:
The following program:
Prints this outside
on here.gpus[0]
:This in
here.gpus[0]
with CPU-as-device:And this on an Nvidia GPU node:
Is this a blocking issue with no known work-arounds?
It just came up in a test, but it's a correctness issue.
The text was updated successfully, but these errors were encountered: