Improving Haskell's Statistical Story #93

idontgetoutmuch · 2024-05-11T09:25:48Z

Random Fu	MWC Random
Bernoulli	bernoulli
Beta	beta
Binomial
Categorical	categorical
ChiSquare	chiSquare
Dirichlet	dirichlet
Exponential	exponential
StretchedExponential
	truncatedExp
	geometric
Gamma	gamma
Multinomial
Normal	normal
Pareto
Poisson
Rayleigh
Simplex
T
Triangular
Uniform
Weibull
Ziggurat

idontgetoutmuch · 2024-05-11T09:36:02Z

With random and the distributions in mwc-random and random-fu we can come up with something better and even comparable to what is available in R. The table above shows the samplers that are available. In R one also has the CDFs, PDFs and PMFs. I have tried to include this in random-fu. Conceivably, we could split off the distributions from mwc-random and the the CDFs etc from random-fu and statistics and create a new R-like package. I've also started comparing various implementations so for example

{-# LANGUAGE ImportQualifiedPost #-}

{-# OPTIONS_GHC -Wall -Wno-type-defaults #-}

import Data.Random qualified as RF
import Data.Random.Distribution.Binomial qualified as RF
import System.Random.MWC.Distributions qualified as MWC
import System.Random.Stateful (mkStdGen, newIOGenM)
import Criterion.Main
  ( Benchmark,
    bench,
    defaultConfig,
    defaultMainWith,
    nfIO,
  )
import Criterion.Types (Config (csvFile, rawDataFile))
import Control.Monad (liftM, replicateM)


binomialFu :: IO Int
binomialFu = do g <- newIOGenM (mkStdGen 1729)
                x <- liftM sum $ replicateM 1000 $ RF.runRVar (RF.binomial 1400 (0.4 :: Double)) g
                return x

binomialMwc :: IO Int
binomialMwc = do g <- newIOGenM (mkStdGen 1729)
                 x <- liftM sum $ replicateM 1000 $ MWC.binomial 1400 0.4 g
                 return x

normalSamplesCSV :: FilePath
normalSamplesCSV = "normal-samples.csv"

rawDAT :: FilePath
rawDAT = "raw.dat"

normalBenchmarks :: [Benchmark]
normalBenchmarks = [ bench "Binomial fu" $ nfIO binomialFu
                   , bench "Binomial mwc" $ nfIO binomialMwc
                   ]

main :: IO ()
main = do
  let configNormal = defaultConfig {csvFile = Just normalSamplesCSV, rawDataFile = Just rawDAT}
  defaultMainWith configNormal normalBenchmarks

which gives

benchmarking Binomial fu
time                 5.706 ms   (5.680 ms .. 5.730 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 5.746 ms   (5.739 ms .. 5.762 ms)
std dev              28.68 μs   (13.47 μs .. 58.35 μs)

benchmarking Binomial mwc
time                 268.7 μs   (267.9 μs .. 269.9 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 267.7 μs   (267.3 μs .. 268.5 μs)
std dev              1.852 μs   (1.069 μs .. 3.189 μs)

So we know the new binomial sampler in mwc-random is better (CAVEAT: for one given set of parameters) by about x20 (and that is without "squeezing").

idontgetoutmuch · 2024-05-11T09:37:09Z

@Shimuuar I think this would be better as a discussion rather than an issue. I think you as admin for the repo on GitHub have to do something: https://docs.github.com/en/discussions/quickstart

Shimuuar · 2024-05-11T11:29:40Z

Yes new package could be useful. It would give chance to rethink API. statistics take on distribution I think isn't very successful

Discussions could be nice but I don't have admin rights, only commit rights

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving Haskell's Statistical Story #93

Improving Haskell's Statistical Story #93

idontgetoutmuch commented May 11, 2024

idontgetoutmuch commented May 11, 2024 •

edited

idontgetoutmuch commented May 11, 2024

Shimuuar commented May 11, 2024

Improving Haskell's Statistical Story #93

Improving Haskell's Statistical Story #93

Comments

idontgetoutmuch commented May 11, 2024

idontgetoutmuch commented May 11, 2024 • edited

idontgetoutmuch commented May 11, 2024

Shimuuar commented May 11, 2024

idontgetoutmuch commented May 11, 2024 •

edited