Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple systems - one kernel - multiple outputs convolution #159

Open
CoffeeExterminator opened this issue Feb 29, 2024 · 10 comments
Open

Comments

@CoffeeExterminator
Copy link

CoffeeExterminator commented Feb 29, 2024

Hi,
I'm trying to perform a line-by-line image convolution using vkFFT: I have an image of size N * M and a kernel of size N * 1, and I'm trying to perform a convolution for each line with this same kernel. So, I'm performing an FFT for the kernel, and then trying to perform another FFT with 1D convolution for the image, setting numberBatches to M. But the convolution is only performed for the first line. I've tried playing with other parameters like coordinateFeatures and numberKernels, but haven't found the way to get the result I want. Am I missing something or is such convolution not currently possible?

@DTolm
Copy link
Owner

DTolm commented Feb 29, 2024

Hello,

I believe this is the same request as vincefn/pyvkfft#33 (comment). Shouldn't be hard to add and seems to be really useful. I will try to add this functionality in the next couple of days,

Best regards,
Dmitrii

@CoffeeExterminator
Copy link
Author

I believe it is indeed the same request, I haven't seen it was already mentioned since it's for pyvkfft.
It would be a very useful addition, looking forward to it, thanks.

DTolm added a commit that referenced this issue Mar 2, 2024
-enabled through singleKernelMultipleBatches parameter
-kernel batching is controlled through coordinateFeatures
-number of input/output systems is controlled through numberBatches
-sample 53 shows the usage of this option
@DTolm
Copy link
Owner

DTolm commented Mar 2, 2024

I have added a sample (53) that shows how the requested functionality can be implemented in the new develop version of VkFFT. I haven't fully tested it yet apart from the example, but the implementation didn't require any big modifications, so it should work. If some more help is needed - I will be happy to answer any related questions.

@CoffeeExterminator
Copy link
Author

Hi,
I have tested the new functionality for the example I mentioned earlier and the C2C multiple batches convolution worked fine. However, I have encountered some unexpected behavior with out-of-place R2C convolution (not sure if it's related to batches), which I hope you could clarify.

  1. Not exactly related, but are there any requirements for dimension sizes when it comes to convolution? I haven't seen any mentioned, but have encountered initialization of app_convolution fail with specific sizes, one specific example being
configuration.FFTdim = 1; 
configuration.size[0] = 449;
configuration.size[1] = 1;
configuration.size[2] = 1;

This can be reproduced by modifying any of the samples with convolution (I personally have tried 50, 52, 53) accordingly and doesn't seem to depend on number of batches or coordinateFeatures.
2. As said, I was trying to perform a line-by-line image convolution. So, I have a 1D kernel and an image (single channel) and the idea was to first perform 1D R2C FFT for the kernel, then perform 1D R2C FFT for each line of the image (setting the number of batches to image height) with convolution (using the kernel FFT I got in the previous step), and finally perform IFFT for the result.
Now, about the complications I've faced.
Here are stage-by-stage results of FFT-IFFT for the image without any convolution (the original image in the top left corner, its FFT below and the inverse FFT to the right). Everything is as expected.
fft_no_convolution
After that I enable convolution with a kernel (it's the same as identity kernel, except in place of 1.0 there's 0.5, so I expect to get a darkened image in the end). I expected to see the original image, the somewhat similar picture for FFT and the darker version of original image for the result of IFFT. However, what I got was this:
fft+convolution
Notice how the result of the forward FFT with convolution doesn't look like the result of the FFT, but instead looks like something done to the original image. And the final result is nothing like I expected as well.
I have then tried to split the process by first performing FFT for the image with no convolution, then perform a C2C convolution (by initializing and using a different application) for the result, and then perform the IFFT. This got me the expected result for every stage.
convolution_separated
So, to my understanding either the convolution happened in the wrong place, or my configuration parameters were very-very wrong. I'm providing the simplified code with all the main steps and configurations.

uint64_t srcSize = sizeof(float) * width * height;
uint64_t dstSize = sizeof(float) * 2 * (width / 2 + 1) * height;

uint64_t kernelSrcSize = width * sizeof(float);
uint64_t kernelDstSize = sizeof(float) * 2 * (width / 2 + 1);

// kernel fft
kernel_configuration.FFTdim = 1;
kernel_configuration.size[0] = width;
kernel_configuration.size[1] = 1;
kernel_configuration.size[2] = 1;

kernel_configuration.kernelConvolution = true;
kernel_configuration.coordinateFeatures = 1;
kernel_configuration.normalize = 1;

kernel_configuration.performR2C = 1;
kernel_configuration.isInputFormatted = 1;
kernel_configuration.inputBuffer = &kernelSrc;
kernel_configuration.inputBufferSize = &kernelSrcSize;
kernel_configuration.buffer = &kernel;
kernel_configuration.bufferSize = &kernelDstSize;
...
auto resFFT = initializeVkFFT(&app_kernel, kernel_configuration);
...
resFFT = VkFFTAppend(&app_kernel, -1, &launchParams);
...
// image fft and convolution
convolution_configuration = kernel_configuration;
convolution_configuration.kernelConvolution = false;
convolution_configuration.performConvolution = true;
convolution_configuration.numberBatches = height;

convolution_configuration.isInputFormatted = 1;
convolution_configuration.inputBuffer = &src;
convolution_configuration.buffer = &dst;
convolution_configuration.inputBufferSize = &srcSize;
convolution_configuration.bufferSize = &dstSize;
convolution_configuration.performR2C = 1;
convolution_configuration.inverseReturnToInputBuffer = 1;
convolution_configuration.kernel = &kernel;
convolution_configuration.singleKernelMultipleBatches = true;
convolution_configuration.kernelSize = &kernelDstSize;
...
resFFT = initializeVkFFT(&app_convolution, convolution_configuration);
...
resFFT = VkFFTAppend(&app_convolution, -1, &launchParams);
...
// result ifft
resFFT = VkFFTAppend(&app_convolution, 1, &launchParams);

@DTolm
Copy link
Owner

DTolm commented Mar 6, 2024

Not exactly related, but are there any requirements for dimension sizes when it comes to convolution?

Yes, there is an issue that the convolution optimization has not been implemented if the last axis it is merged in has to do something else but plain radix2-13 C2C. This includes 1D R2C, R2R, Rader and Bluestein sequences. I need to fix this to work for all cases as not many people used this functionality before (so the documentation on convolutions is also lacking). Sorry for the confusion.

@CoffeeExterminator
Copy link
Author

CoffeeExterminator commented Mar 6, 2024

Yes, there is an issue that the convolution optimization has not been implemented if the last axis it is merged in has to do something else but plain radix2-13 C2C.

Thanks for clarifying. Any hint what it should be aligned to then, to not encounter such problems for now?

As for the second issue, I'm starting to suspect it might have something to do with buffer layout in my configuration, but I'm still looking into it.

@DTolm
Copy link
Owner

DTolm commented Mar 6, 2024

Any hint what it should be aligned to then, to not encounter such problems for now?

Convolutions should be safe now for C2C sequences decomposable as a multiplication of an arbitrary number of primes from 2 to 13.

But isn't the second issue related to 1D R2C convolutions that are not supported currently?

@CoffeeExterminator
Copy link
Author

Convolutions should be safe now for C2C sequences decomposable as a multiplication of an arbitrary number of primes from 2 to 13.

Thanks again.

But isn't the second issue related to 1D R2C convolutions that are not supported currently?

I must have missed it. So R2C convolutions aren't supported specifially for 1D case? Cause I think there's a 2D R2C convolution in sample 52.

@DTolm
Copy link
Owner

DTolm commented Mar 6, 2024

I must have missed it. So R2C convolutions aren't supported specifially for 1D case? Cause I think there's a 2D R2C convolution in sample 52.

Exactly. The last axis of transform needs to be a simple C2C to merge the convolution in it. I will try to fix this in the future to be more general.

@CoffeeExterminator
Copy link
Author

That explains it then, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants