Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large errors using Intel OpenCL for CPUs #148

Open
peastman opened this issue Jan 9, 2024 · 8 comments
Open

Large errors using Intel OpenCL for CPUs #148

peastman opened this issue Jan 9, 2024 · 8 comments

Comments

@peastman
Copy link

peastman commented Jan 9, 2024

When running VkFFT under Intel's OpenCL for CPUs, I find that it gives very large errors. The results aren't totally wrong, but the accuracy is much worse than I expect.

The following C++ function illustrates the problem. It fills an array with random values, performs forward and backward 3D FFTs, and compares the output to the input. On most OpenCL implementations I've tested the agreement is excellent. For example on NVIDIA OpenCL the maximum error is 7.59959e-07. But on Intel it is hundreds of times larger: 0.000337124.

Any idea what could be causing this? Thank you for your help!

void test(cl::Device device, cl::Context context) {
    // Initialize VkFFT.

    int xsize = 25, ysize = 25, zsize = 25;
    VkFFTApplication app;
    app = {};
    VkFFTConfiguration config = {};
    config.FFTdim = 3;
    config.size[0] = zsize;
    config.size[1] = ysize;
    config.size[2] = xsize;
    config.device = &device();
    config.context = &context();
    config.inputBufferStride[0] = zsize;
    config.inputBufferStride[1] = ysize*zsize;
    config.inputBufferStride[2] = xsize*ysize*zsize;
    VkFFTResult result = initializeVkFFT(&app, config);
    cl::CommandQueue queue(context, device);

    // Generate the input data.

    default_random_engine generator;
    uniform_real_distribution<float> distribution(0.0, 1.0);
    vector<float> input(2*xsize*ysize*zsize);
    for (int i = 0; i < input.size(); i++)
        input[i] = distribution(generator);
    int bufferSize = input.size()*sizeof(float);
    cl::Buffer buffer(context, CL_MEM_READ_WRITE, bufferSize);
    queue.enqueueWriteBuffer(buffer, CL_TRUE, 0, bufferSize, input.data());

    // Perform the FFTs.

    VkFFTLaunchParams params = {};
    params.inputBuffer = &buffer();
    params.buffer = &buffer();
    params.commandQueue = &queue();
    result = VkFFTAppend(&app, -1, &params);
    result = VkFFTAppend(&app, 1, &params);

    // Check the result.

    vector<float> output(input.size());
    queue.enqueueReadBuffer(buffer, CL_TRUE, 0, bufferSize, output.data());
    float maxError = 0;
    float scale = 1.0/(xsize*ysize*zsize);
    for (int i = 0; i < input.size(); i++)
        maxError = max(maxError, fabs(input[i]-scale*output[i]));
    printf("%g\n", maxError);
}
@DTolm
Copy link
Owner

DTolm commented Jan 10, 2024

Hello,

I have not yet managed to make this runtime work with my machine, but this issue resembles the low accuracy of sincos functions on Intel iGPUs. There is a fix for intel vendor id that it forces to precompute all twiddle factors in LUT there, my guess would be that vendor id is different in this case and the fix is not applied. Can you try setting config.useLUT = 1 and see if this fixes the issue?

Best regards,
Dmitrii

@peastman
Copy link
Author

That fixes it. Thank you so much!

@DTolm
Copy link
Owner

DTolm commented Jan 11, 2024

Great! However, I am not sure how to make this permanent as this is an issue of this particular specification.

@peastman
Copy link
Author

Would it make sense to enable useLUT by default whenever the platform vendor is Intel, whatever type of device it is?

@DTolm
Copy link
Owner

DTolm commented Jan 11, 2024

It already is on by default for vendor 0x8086 (Intel) though. Can you check the vendorID value for your device? The command is clGetDeviceInfo(device, CL_DEVICE_VENDOR_ID, sizeof(cl_int), &vendorID, 0);

@peastman
Copy link
Author

How very clever of them! :)

It's a little odd. They actually create two different platforms, each with a single device. The first one is called "Intel(R) FPGA Emulation Platform for OpenCL(TM)" and the device has vendor ID 0x1172. The second one is called "Intel(R) OpenCL" and the device has vendor ID 0x8086. The platform vendor for both of them is "Intel(R) Corporation".

@DTolm
Copy link
Owner

DTolm commented Jan 12, 2024

This vendor id apparently belongs to Altera Corporation. I guess I can add a special check for CL_PLATFORM_VENDOR to be Intel(R) Corporation as it is the common factor.

@peastman
Copy link
Author

Altera is now part of Intel. They bought it some years back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants