Skip to content

Commit

Permalink
Improve the performance of convolve (and correlate)
Browse files Browse the repository at this point in the history
The exising implementation of `convolve` (which is also used for `correlate`) was pretty slow. The slowness was due to the fact that we were using regular tensor element access in the inner loop of the convolution, which was very expensive.

The solution was to ensure that the input tensors were contiguous (by cloning them if necessary) and then using `unsafe_raw_buf` for all the tensor element accesses, which is safe because we know that the indexes are all within the tensor boundaries.

On my system this change makes a large convolution go from ~1.5 seconds in release mode and ~0.25 seconds in danger mode down to ~65 usecs in both modes (i.e. a x23 reduction in release mode and a x3.8 reduction in danger mode)!
  • Loading branch information
AngelEzquerra committed May 1, 2024
1 parent 1448698 commit 8474811
Showing 1 changed file with 14 additions and 1 deletion.
15 changes: 14 additions & 1 deletion src/arraymancer/tensor/math_functions.nim
Original file line number Diff line number Diff line change
Expand Up @@ -294,12 +294,25 @@ proc convolveImpl[T: SomeNumber | Complex32 | Complex64](
# Initialize the result tensor
result = zeros[T](len_result)

# Ensure that the input tensors are contiguous so that they can be accessed
# efficiently using `unsafe_raw_buf` in the inner loop
let f = if f.isContiguous(): f else: f.clone()
let g = if g.isContiguous(): g else: g.clone()

# And perform the convolution
omp_parallel_blocks(block_offset, block_size, len_result):
for n in block_offset ..< block_offset + block_size:
let shift = n + offset
for m in max(0, shift - g.size + 1) .. min(f.size - 1, shift):
result[n] += f[m] * g[shift - m]
# We want to do the following operation:
# result[n] += f[m] * g[shift - m]
# In order to do it efficently, we want to avoid all the overhead of
# using regular `[]` access. Since we know that we are working with
# continuous, rank-1 tensors, that `n`, `m` and `shift-m` are within
# the boundaries of `result`, `f` and `g` (respectively) and since
# `T is KnownSupportsCopyMem`, it is safe (and way more efficient) to
# use `unsafe_raw_buf` to access the actual tensor elements here:
result.unsafe_raw_buf[n] += f.unsafe_raw_buf[m] * g.unsafe_raw_buf[shift - m]

proc convolve*[T: SomeNumber | Complex32 | Complex64](
t1, t2: Tensor[T],
Expand Down

0 comments on commit 8474811

Please sign in to comment.