Large errors using Intel OpenCL for CPUs #148

peastman · 2024-01-09T23:39:18Z

When running VkFFT under Intel's OpenCL for CPUs, I find that it gives very large errors. The results aren't totally wrong, but the accuracy is much worse than I expect.

The following C++ function illustrates the problem. It fills an array with random values, performs forward and backward 3D FFTs, and compares the output to the input. On most OpenCL implementations I've tested the agreement is excellent. For example on NVIDIA OpenCL the maximum error is 7.59959e-07. But on Intel it is hundreds of times larger: 0.000337124.

Any idea what could be causing this? Thank you for your help!

void test(cl::Device device, cl::Context context) {
    // Initialize VkFFT.

    int xsize = 25, ysize = 25, zsize = 25;
    VkFFTApplication app;
    app = {};
    VkFFTConfiguration config = {};
    config.FFTdim = 3;
    config.size[0] = zsize;
    config.size[1] = ysize;
    config.size[2] = xsize;
    config.device = &device();
    config.context = &context();
    config.inputBufferStride[0] = zsize;
    config.inputBufferStride[1] = ysize*zsize;
    config.inputBufferStride[2] = xsize*ysize*zsize;
    VkFFTResult result = initializeVkFFT(&app, config);
    cl::CommandQueue queue(context, device);

    // Generate the input data.

    default_random_engine generator;
    uniform_real_distribution<float> distribution(0.0, 1.0);
    vector<float> input(2*xsize*ysize*zsize);
    for (int i = 0; i < input.size(); i++)
        input[i] = distribution(generator);
    int bufferSize = input.size()*sizeof(float);
    cl::Buffer buffer(context, CL_MEM_READ_WRITE, bufferSize);
    queue.enqueueWriteBuffer(buffer, CL_TRUE, 0, bufferSize, input.data());

    // Perform the FFTs.

    VkFFTLaunchParams params = {};
    params.inputBuffer = &buffer();
    params.buffer = &buffer();
    params.commandQueue = &queue();
    result = VkFFTAppend(&app, -1, &params);
    result = VkFFTAppend(&app, 1, &params);

    // Check the result.

    vector<float> output(input.size());
    queue.enqueueReadBuffer(buffer, CL_TRUE, 0, bufferSize, output.data());
    float maxError = 0;
    float scale = 1.0/(xsize*ysize*zsize);
    for (int i = 0; i < input.size(); i++)
        maxError = max(maxError, fabs(input[i]-scale*output[i]));
    printf("%g\n", maxError);
}

DTolm · 2024-01-10T09:13:55Z

Hello,

I have not yet managed to make this runtime work with my machine, but this issue resembles the low accuracy of sincos functions on Intel iGPUs. There is a fix for intel vendor id that it forces to precompute all twiddle factors in LUT there, my guess would be that vendor id is different in this case and the fix is not applied. Can you try setting config.useLUT = 1 and see if this fixes the issue?

Best regards,
Dmitrii

peastman · 2024-01-10T17:45:26Z

That fixes it. Thank you so much!

DTolm · 2024-01-11T18:31:55Z

Great! However, I am not sure how to make this permanent as this is an issue of this particular specification.

peastman · 2024-01-11T19:16:31Z

Would it make sense to enable useLUT by default whenever the platform vendor is Intel, whatever type of device it is?

DTolm · 2024-01-11T19:34:54Z

It already is on by default for vendor 0x8086 (Intel) though. Can you check the vendorID value for your device? The command is clGetDeviceInfo(device, CL_DEVICE_VENDOR_ID, sizeof(cl_int), &vendorID, 0);

peastman · 2024-01-11T20:09:27Z

How very clever of them! :)

It's a little odd. They actually create two different platforms, each with a single device. The first one is called "Intel(R) FPGA Emulation Platform for OpenCL(TM)" and the device has vendor ID 0x1172. The second one is called "Intel(R) OpenCL" and the device has vendor ID 0x8086. The platform vendor for both of them is "Intel(R) Corporation".

DTolm · 2024-01-12T08:30:26Z

This vendor id apparently belongs to Altera Corporation. I guess I can add a special check for CL_PLATFORM_VENDOR to be Intel(R) Corporation as it is the common factor.

peastman · 2024-01-12T16:19:51Z

Altera is now part of Intel. They bought it some years back.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large errors using Intel OpenCL for CPUs #148

Large errors using Intel OpenCL for CPUs #148

peastman commented Jan 9, 2024

DTolm commented Jan 10, 2024

peastman commented Jan 10, 2024

DTolm commented Jan 11, 2024

peastman commented Jan 11, 2024

DTolm commented Jan 11, 2024

peastman commented Jan 11, 2024

DTolm commented Jan 12, 2024

peastman commented Jan 12, 2024

Large errors using Intel OpenCL for CPUs #148

Large errors using Intel OpenCL for CPUs #148

Comments

peastman commented Jan 9, 2024

DTolm commented Jan 10, 2024

peastman commented Jan 10, 2024

DTolm commented Jan 11, 2024

peastman commented Jan 11, 2024

DTolm commented Jan 11, 2024

peastman commented Jan 11, 2024

DTolm commented Jan 12, 2024

peastman commented Jan 12, 2024