Produce multiple random numbers efficiently #66

OlivierSohn · 2018-04-09T00:48:50Z

This is probably not meargable as-is, because it's not in the spirit of the current API, but 'foldMUniforms' could be used to implement a new fold-like function in the class Variate, to allow creating N random values that are consumed by a monadic accumulating function.

Note that creating N numbers with 'foldMUniforms' will require n+2 reads and writes to the state vector, whereas creating N numbers with 'uniform' requires 3*n reads and writes to the state vector. Benchmarks on my application show a speed-up, because random number generation is a bottleneck for me.

Shimuuar · 2018-04-09T16:26:36Z

Essentially it's about avoiding two stored into array and then reading them back immediately. On the surface it looks really similar to the deforestation but I don't know whether it's possible to implement it via rewrite rules. At very least it seems hard.

On completely unrelated note I thought about unrelated microoptimization: replace unboxed arrays with arrays from primitive. Indexing should be faster there since those don't support slicing.

OlivierSohn · 2018-04-09T17:01:36Z

Thanks, I wasn't aware of the indexing overhead, I never used primitive arrays but I'll try it out when I have some time!

Deforestation is also a new subject for me, but I feel it would be complicated (and lead to complicated code?) to explain to the compiler how to fuse two operations, I might be wrong though ... and would definitely like to see how this could be done :)

OlivierSohn · 2018-04-09T19:47:05Z

I see a 2.5% performance boost when using a Storable. Interesting!

OlivierSohn · 2018-04-09T20:12:40Z

~~Note that it is a breaking change, since Seed now has a Storable instead of Unboxed.~~ not anymore (see below)

Shimuuar · 2018-04-11T08:51:28Z

Indexing of unboxed arrays is has inherent slowdown because of slicing support. It's done as arr[off + i] instead of arr[i] so you have extra addition. Actually I had arrays from primitive in mind. I'll try to compare performance of unboxed/storable/primitive

Another approach for reducing number of read/writes to state vector is to turn generator into monad Word32 -> Word32 -> (# a, Word32, Word32 #) and let GHC optimier deal with it. It would be interesting to compare performance but it's complete API breakage. Maybe it's good thing

OlivierSohn · 2018-04-11T14:07:39Z

I tried with arrays from primitive, I see a slight speedup, but it's unclear if it's noise or not...

I kept the Seed type as it was before, i.e with an unboxed vector, so it is not a breaking change anymore

Shimuuar · 2018-04-30T20:58:18Z

I finally got time to work on PRs.

I cherry picked changes for Gen representation and run becnhmarks. Switch to primitve arrays improved performance by ~5% on my computer. Free 5% is nothing to sneeze at. But all meddling with low level arrays makes me nervous. I'm going to recheck everything and maybe write some tests. Then I'll push them to master

I also checked time of generation of Word32 and Word64. If we say that time for Word32 is read + update + store, and for Word64 it's read + 2·update + write. (read and write corresponds to index and carry). Then both update and (read+write)/2 take about 4ns. We have big performance problem at our hands by forcing GHC to store/read variables needlessly. Approach with fold solves it in special case. At least we need to expose enough internals in safe manner so people could write such special purpose folds if they need them

OlivierSohn · 2018-04-30T22:46:11Z

I agree.

Another thing to consider (and maybe document for the user) is that with the Gen representation change, the array is pinned, so the GC won't be able to move it. It may or may not be a good thing, depending on the application I guess...

Shimuuar · 2020-07-14T17:11:03Z

I finally got around to measure impact of different vector variants. Here is distribution of run times:

Unboxed, Primitive, and PrimArray from primitive perform identically and Storable is about 5% slower. Probably because of extra pointer chase in ForeignPtr. I thought that probably primitive vector would lead to faster build but unboxed turned out to be slightly faster: 15.9s vs 16.1s. So current vector backend is likely optimal

I'll get to the foldMUniforms tomorrow.

Shimuuar · 2020-07-21T17:42:24Z

I rebased PR over current master. First of all benchmarks: it does provide nice ~25% speedup over replicateM_ n (uniform gen).

What isn't very good. It provides very specific primitive: iterate function N times. Could it be generalized? One thing that comes to mind is unfolds. Maybe there's something that could usefully generalize both?

There's obvious thing turn passing of i & c into monad but that turns 25% speedup into 10% slowdown presumably from state carrying boilerplate. Loop should be generated in the function so GHC could optimize it well

OlivierSohn · 2020-07-21T18:16:36Z

Hello @Shimuuar, reading this MR brings back memories from when I was developping my little console game, which was a lot of fun! In the meantime I have moved to other personal projects (music, convolution reverbs, etc...) so I won't pursue the initial goal of merging this MR, but I hope it will be useful, some parts of it at least!
Cheers,
Olivier

Shimuuar · 2020-07-22T10:00:57Z

Well 25% speedup is nothing to sneer at :). I think I'll release 0.15 without this PR and start updating statistics. Then I'll revisit this PR. Maybe I'll get some idea in the meantime

Add foldMUniforms

ddefea7

Use a Storable vector for the state.

145e551

Use Data.Primitive.ByteArray instead of Vector.Unboxed for state vector

89a5657

OlivierSohn mentioned this pull request Apr 14, 2018

Vectorized version ? #67

Open

Shimuuar mentioned this pull request Apr 14, 2018

When a user specifies an invalid carry value (bigger than the multipl… #65

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Produce multiple random numbers efficiently #66

Produce multiple random numbers efficiently #66

OlivierSohn commented Apr 9, 2018

Shimuuar commented Apr 9, 2018

OlivierSohn commented Apr 9, 2018

OlivierSohn commented Apr 9, 2018 •

edited

OlivierSohn commented Apr 9, 2018 •

edited

Shimuuar commented Apr 11, 2018

OlivierSohn commented Apr 11, 2018 •

edited

Shimuuar commented Apr 30, 2018

OlivierSohn commented Apr 30, 2018

Shimuuar commented Jul 14, 2020

Shimuuar commented Jul 21, 2020

OlivierSohn commented Jul 21, 2020

Shimuuar commented Jul 22, 2020

Produce multiple random numbers efficiently #66

Are you sure you want to change the base?

Produce multiple random numbers efficiently #66

Conversation

OlivierSohn commented Apr 9, 2018

Shimuuar commented Apr 9, 2018

OlivierSohn commented Apr 9, 2018

OlivierSohn commented Apr 9, 2018 • edited

OlivierSohn commented Apr 9, 2018 • edited

Shimuuar commented Apr 11, 2018

OlivierSohn commented Apr 11, 2018 • edited

Shimuuar commented Apr 30, 2018

OlivierSohn commented Apr 30, 2018

Shimuuar commented Jul 14, 2020

Shimuuar commented Jul 21, 2020

OlivierSohn commented Jul 21, 2020

Shimuuar commented Jul 22, 2020

OlivierSohn commented Apr 9, 2018 •

edited

OlivierSohn commented Apr 9, 2018 •

edited

OlivierSohn commented Apr 11, 2018 •

edited