Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Pass all method parameters through the call stack and improve code cache strategy #80

Open
gigiblender opened this issue Apr 9, 2021 · 0 comments
Assignees
Labels
discussion enhancement New feature or request

Comments

@gigiblender
Copy link
Member

gigiblender commented Apr 9, 2021

Currently, we have two different strategies for caching compilation results, one for each backend (PTX, OpenCL).

For the PTX backend, we rely on the identity of the function parameters passed to the task. The issue with this is that a recompilation will be triggered every time a parameter is changed.

For the OpenCL backend, the key to the code cache is scheduleNo.taskNo-methodName. This can cause conflicts when multiple task schedules with the same name are created in different scopes.

For example with the code below:

    static class Data {
        int[] inTor;
        int[] outTor;
        int[] inSeq;
        int[] outSeq;

        public Data(int inTorSize, int outTorSize) {
            Random random = new Random();

            inTor = new int[inTorSize];
            outTor = new int[outTorSize];
            for (int i = 0; i < inTorSize; i++) {
                inTor[i] = random.nextInt();
            }
            for (int i = 0; i < outTorSize; i++) {
                outTor[i] = random.nextInt();
            }

            inSeq = inTor.clone();
            outSeq = outTor.clone();
        }
    }

    public static void testMethod(int[] in, int[] out) {
        for (@Parallel int i = 0; i < in.length; i++) {
            out[i] = in[i];
        }

    }

    public static void testMethod2(int[] in, int[] out) {
        for (@Parallel int i = 0; i < in.length; i++) {
            out[i] = in[i];
        }

    }

    public static void main(String[] args) {
        int N1 = 1024;

        // // FIRST SCOPE
        {
            Data data = new Data(N1, N1 * N1);
            TaskSchedule ts = new TaskSchedule("s0")
                    .task("t0", Main::testMethod, data.inTor, data.outTor)
                    .task("t1", Main::testMethod2, data.inTor, data.outTor)
                    .streamOut(data.inTor, data.outTor);

            ts.execute();
        }

        // SECOND SCOPE
        {
            N1 = N1 / 2;                                          // <---------- Use different input objects and size
            Data data = new Data(N1, N1 * N1);
            TaskSchedule ts = new TaskSchedule("s0")
                    .task("t0", Main::testMethod, data.inTor, data.outTor)
                    .task("t1", Main::testMethod2, data.inTor, data.outTor)
                    .streamOut(data.inTor, data.outTor);

            ts.execute();
        }
    }

The OpenCL backend will not recompile for the first task t0 in the second scope and therefore use the wrong inlined array length in.length value in the kernel. The reason the second task t1 is recompiled is a side effect from here
The PTX backend will trigger 4 compilations in total.

I think the way to solve this is to pass all the task parameters (primitives and object references) through the call stack.
We also might need to stop inlining array lengths in the @Parallel annotated loops (for(;i_3 < 1024;)).

@gigiblender gigiblender added discussion bug Something isn't working labels Apr 9, 2021
@jjfumero jjfumero added enhancement New feature or request and removed bug Something isn't working labels Apr 9, 2021
@jjfumero jjfumero changed the title Pass all method parameters through the call stack and improve code cache strategy [Proposal] Pass all method parameters through the call stack and improve code cache strategy Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants