Mitigate catastrophic cancellation in cross products and other code #435

mosra · 2020-04-21T21:15:55Z

Original article: https://pharr.org/matt/blog/2019/11/03/difference-of-floats.html

While this makes 32-bit float cross product precision basically equivalent to a 64-bit calculation casted back to 32-bit, it stays with the speed halfway between the straightforward 32- and 64-bit implementation. Benchmark on Release:

Starting Magnum::Math::Test::VectorBenchmark with 9 test cases...
 BENCH [2]   0.98 ± 0.05   ns cross2Baseline<Float>()@24999x100000 (wall time)
 BENCH [3]   3.44 ± 0.11   ns cross2Baseline<Double>()@24999x100000 (wall time)
 BENCH [4]   1.97 ± 0.08   ns cross2()@24999x100000 (wall time)
 BENCH [5]   2.22 ± 0.11   ns cross3Baseline<Float>()@24999x100000 (wall time)
 BENCH [6]   4.69 ± 0.22   ns cross3Baseline<Double>()@24999x100000 (wall time)
 BENCH [7]   3.32 ± 0.15   ns cross3()@24999x100000 (wall time)
Finished Magnum::Math::Test::VectorBenchmark with 0 errors out of 450000 checks.

However this happens only on platforms that actually have a FMA instruction. For example on Emscripten the code is ten times slower than the baseline implementation, which is not an acceptable tradeoff -- there simply using doubles to calculate the result is faster. And enabling the more precise variant only on some platforms doesn't seem like a good idea for portability. For the record, benchmark output on Chrome (node.js in the terminal gives similar results):

Starting Magnum::Math::Test::VectorBenchmark with 7 test cases...
 BENCH [2]   2.53 ± 0.34   ns cross2Baseline<Float>()@499x100000 (wall time)
 BENCH [3]   5.18 ± 1.30   ns cross2Baseline<Double>()@499x100000 (wall time)
 BENCH [4]   6.22 ± 0.46   ns cross2()@499x100000 (wall time)
 BENCH [5]   2.73 ± 0.35   ns cross3Baseline<Float>()@499x100000 (wall time)
 BENCH [6]   5.94 ± 0.61   ns cross3Baseline<Double>()@499x100000 (wall time)
 BENCH [7]  28.77 ± 2.40   ns cross3()@499x100000 (wall time)
Finished Magnum::Math::Test::VectorBenchmark with 0 errors out of 7000 checks.

Stashing this aside until I'm clearer what to do with this. Things to keep an eye on:

FMA in webassembly: https://github.com/WebAssembly/simd/issues/10
similar optimizations for lerp() as described at https://fgiesen.wordpress.com/2012/08/15/linear-interpolation-past-present-and-future/ , probably with very similar perf characteristic (okay on desktop, terrible on the web)

Have to do some precision improvements, so a baseline is needed. The debug perf is beyond awful, actually.

And the Vector3 version 5% slower in Release, on GCC at least. FFS, what was I thinking with the gather() things. Nice in user code, extremely bad in library code.

While this makes 32-bit float cross product precision basically equivalent to a 64-bit calculation casted back to 32-bit, it stays with the speed halfway between the straightforward 32- and 64-bit implementation. However only on platforms that actually have a FMA instruction. For example on Emscripten the code is TEN TIMES slower than the baseline implementation, which is not an acceptable tradeoff -- there simply using doubles to calculate the result is faster. And enabling the more precise variant only on some platforms doesn't seem like a good idea for portability. Stashing this aside until I'm clearer what to do with this.

mosra added 4 commits April 21, 2020 22:02

Math: benchmark vector dot() and cross().

341a497

Have to do some precision improvements, so a baseline is needed. The debug perf is beyond awful, actually.

Math: make cross() 10x faster in Debug.

573125d

And the Vector3 version 5% slower in Release, on GCC at least. FFS, what was I thinking with the gather() things. Nice in user code, extremely bad in library code.

Math: make dot() twice as fast in Debug.

fc3382e

mosra added this to TODO in Math and algorithms via automation Apr 21, 2020

mosra mentioned this pull request May 9, 2020

2020.06 release #411

Closed

87 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mitigate catastrophic cancellation in cross products and other code #435

Mitigate catastrophic cancellation in cross products and other code #435

mosra commented Apr 21, 2020

Mitigate catastrophic cancellation in cross products and other code #435

Are you sure you want to change the base?

Mitigate catastrophic cancellation in cross products and other code #435

Conversation

mosra commented Apr 21, 2020