Skip to content

Commit

Permalink
Improve the performance of convolve (and correlate)
Browse files Browse the repository at this point in the history
The exising implementation of `convolve` (which is also used for `correlate`) was pretty slow. The slowness was due to the fact that we were using regular tensor element access in the inner loop of the convolution, which was very expensive.

The solution was to ensure that the input tensors were contiguous (by cloning them if necessary) and then using `unsafe_raw_buf` for all the tensor element accesses, which is safe because we know that the indexes are all within the tensor boundaries.

On my system this change makes a large convolution go from ~1.5 seconds in release mode and ~0.25 seconds in danger mode down to ~65 usecs in both modes (i.e. a x23 reduction in release mode and a x3.8 reduction in danger mode)!
  • Loading branch information
AngelEzquerra committed May 12, 2024
1 parent b202709 commit ab284a9
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 1 deletion.
15 changes: 14 additions & 1 deletion src/arraymancer/tensor/math_functions.nim
Original file line number Diff line number Diff line change
Expand Up @@ -294,12 +294,25 @@ proc convolveImpl[T: SomeNumber | Complex32 | Complex64](
# Initialize the result tensor
result = zeros[T](len_result)

# Ensure that the input tensors are contiguous so that they can be accessed
# efficiently using `unsafe_raw_buf` in the inner loop
let f = if f.isContiguous(): f else: f.clone()
let g = if g.isContiguous(): g else: g.clone()

# And perform the convolution
omp_parallel_blocks(block_offset, block_size, len_result):
for n in block_offset ..< block_offset + block_size:
let shift = n + offset
for m in max(0, shift - g.size + 1) .. min(f.size - 1, shift):
result[n] += f[m] * g[shift - m]
# We want to do the following operation:
# result[n] += f[m] * g[shift - m]
# In order to do it efficently, we want to avoid all the overhead of
# using regular `[]` access. Since we know that we are working with
# continuous, rank-1 tensors, that `n`, `m` and `shift-m` are within
# the boundaries of `result`, `f` and `g` (respectively) and since
# `T is KnownSupportsCopyMem`, it is safe (and way more efficient) to
# use `unsafe_raw_buf` to access the actual tensor elements here:
result.unsafe_raw_offset[n] += f.unsafe_raw_offset[m] * g.unsafe_raw_offset[shift - m]

proc convolve*[T: SomeNumber | Complex32 | Complex64](
t1, t2: Tensor[T],
Expand Down
3 changes: 3 additions & 0 deletions tests/tensor/test_math_functions.nim
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,9 @@ proc main() =
check: convolve(a2, b, mode=ConvolveMode.same) == expected_same_a2b
check: convolve(a, b2, mode=ConvolveMode.same) == expected_same_ab2

# Check that convolution works with slices as well
check: convolve(a2[_..^2], b2[_..^2]) == expected_full

test "1-D correlation":
block:
let a = [2, 8, -8, -6, 4].toTensor
Expand Down

0 comments on commit ab284a9

Please sign in to comment.