Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Incorrect result of loop expression that creates records on a GPU #24989

Open
DanilaFe opened this issue May 3, 2024 · 1 comment
Open

Comments

@DanilaFe
Copy link
Contributor

DanilaFe commented May 3, 2024

Summary of Problem

Description:
The following program:

record myRec {
    var x : int;
}

on here.gpus[0] {
    var Source = [1,2,3,4,5];
    writeln(Source);

    var SourceRec = [i in Source] new myRec(i);
    writeln(SourceRec);
}

Prints this outside on here.gpus[0]:

(x = 1) (x = 2) (x = 3) (x = 4) (x = 5)
6 7 8 9 10

This in here.gpus[0] with CPU-as-device:

(x = 1) (x = 2) (x = 3) (x = 4) (x = 5)
6 7 8 9 10

And this on an Nvidia GPU node:

(x = 1) (x = 1) (x = 1) (x = 1) (x = 1)
6 6 6 6 6

Is this a blocking issue with no known work-arounds?
It just came up in a test, but it's a correctness issue.

@e-kayrakli
Copy link
Contributor

I understand the reason, I think we can address it, but not quickly enough for me to handle today. I'll note what I know here:

The loop looks like the following before GPU transforms:

var initTemp: myRec;
foreach arrItem, i .... { // this will turn into a kernel
  init(initTemp, i);
  arrItem = initTemp;  // copy semantics makes this OK

Note that the compiler moves initTemp outside of the loop sometime before GPU transforms. This may be an optimization with the assumption that the ensuing loop is serial, which has been the case historically, but not so much with foreach loops.

What happens next is that GPU transforms sees initTemp as an outer variable, for which it doesn't have any special instructions to handle (we can't handle outer records in general today, because they are in CPU memory). The course of action it takes is to offload that into GPU memory and use it as a singleton. This is the approach we take for array records, which do not change during the course of a kernel, so it is safe there.

I believe the behavior is indeterministic. All GPU threads use the same record temp for initializing, and what gets assigned to array elements can change in different configurations.


Note that this is GPU-driven-comm-adjacent, but even if we can handle any record on GPU kernels, this one's a special case. We definitely need a per-thread copy of that init temp. A stopgap measure could be to handle such things separately, where we put the DefExpr into the kernel to make it thread-private. But I don't like that solution. If we call a loop order-independent, we shouldn't be able to move that DefExpr outside of the loop in the first place. IOW, while this shows as a GPU bug today, if the code in question runs in vectorized way, I suspect we'll suffer from the same thing. So, my ideal solution would be to fix this before GPU transforms happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants