[Bug]: Incorrect result of loop expression that creates records on a GPU #24989

DanilaFe · 2024-05-03T18:19:36Z

Summary of Problem

Description:
The following program:

record myRec {
    var x : int;
}

on here.gpus[0] {
    var Source = [1,2,3,4,5];
    writeln(Source);

    var SourceRec = [i in Source] new myRec(i);
    writeln(SourceRec);
}

Prints this outside on here.gpus[0]:

(x = 1) (x = 2) (x = 3) (x = 4) (x = 5)
6 7 8 9 10

This in here.gpus[0] with CPU-as-device:

(x = 1) (x = 2) (x = 3) (x = 4) (x = 5)
6 7 8 9 10

And this on an Nvidia GPU node:

(x = 1) (x = 1) (x = 1) (x = 1) (x = 1)
6 6 6 6 6

Is this a blocking issue with no known work-arounds?
It just came up in a test, but it's a correctness issue.

The text was updated successfully, but these errors were encountered:

e-kayrakli · 2024-05-03T19:24:09Z

I understand the reason, I think we can address it, but not quickly enough for me to handle today. I'll note what I know here:

The loop looks like the following before GPU transforms:

var initTemp: myRec;
foreach arrItem, i .... { // this will turn into a kernel
  init(initTemp, i);
  arrItem = initTemp;  // copy semantics makes this OK

Note that the compiler moves initTemp outside of the loop sometime before GPU transforms. This may be an optimization with the assumption that the ensuing loop is serial, which has been the case historically, but not so much with foreach loops.

What happens next is that GPU transforms sees initTemp as an outer variable, for which it doesn't have any special instructions to handle (we can't handle outer records in general today, because they are in CPU memory). The course of action it takes is to offload that into GPU memory and use it as a singleton. This is the approach we take for array records, which do not change during the course of a kernel, so it is safe there.

I believe the behavior is indeterministic. All GPU threads use the same record temp for initializing, and what gets assigned to array elements can change in different configurations.

Note that this is GPU-driven-comm-adjacent, but even if we can handle any record on GPU kernels, this one's a special case. We definitely need a per-thread copy of that init temp. A stopgap measure could be to handle such things separately, where we put the DefExpr into the kernel to make it thread-private. But I don't like that solution. If we call a loop order-independent, we shouldn't be able to move that DefExpr outside of the loop in the first place. IOW, while this shows as a GPU bug today, if the code in question runs in vectorized way, I suspect we'll suffer from the same thing. So, my ideal solution would be to fix this before GPU transforms happen.

DanilaFe added type: Bug area: GPU Support labels May 3, 2024

e-kayrakli added the area: Compiler label May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Incorrect result of loop expression that creates records on a GPU #24989

[Bug]: Incorrect result of loop expression that creates records on a GPU #24989

DanilaFe commented May 3, 2024

e-kayrakli commented May 3, 2024

[Bug]: Incorrect result of loop expression that creates records on a GPU #24989

[Bug]: Incorrect result of loop expression that creates records on a GPU #24989

Comments

DanilaFe commented May 3, 2024

Summary of Problem

e-kayrakli commented May 3, 2024