Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving Haskell's Statistical Story #93

Open
idontgetoutmuch opened this issue May 11, 2024 · 3 comments
Open

Improving Haskell's Statistical Story #93

idontgetoutmuch opened this issue May 11, 2024 · 3 comments

Comments

@idontgetoutmuch
Copy link
Member

Random Fu MWC Random
Bernoulli bernoulli
Beta beta
Binomial
Categorical categorical
ChiSquare chiSquare
Dirichlet dirichlet
Exponential exponential
StretchedExponential
truncatedExp
geometric
Gamma gamma
Multinomial
Normal normal
Pareto
Poisson
Rayleigh
Simplex
T
Triangular
Uniform
Weibull
Ziggurat
@idontgetoutmuch
Copy link
Member Author

idontgetoutmuch commented May 11, 2024

With random and the distributions in mwc-random and random-fu we can come up with something better and even comparable to what is available in R. The table above shows the samplers that are available. In R one also has the CDFs, PDFs and PMFs. I have tried to include this in random-fu. Conceivably, we could split off the distributions from mwc-random and the the CDFs etc from random-fu and statistics and create a new R-like package. I've also started comparing various implementations so for example

{-# LANGUAGE ImportQualifiedPost #-}

{-# OPTIONS_GHC -Wall -Wno-type-defaults #-}

import Data.Random qualified as RF
import Data.Random.Distribution.Binomial qualified as RF
import System.Random.MWC.Distributions qualified as MWC
import System.Random.Stateful (mkStdGen, newIOGenM)
import Criterion.Main
  ( Benchmark,
    bench,
    defaultConfig,
    defaultMainWith,
    nfIO,
  )
import Criterion.Types (Config (csvFile, rawDataFile))
import Control.Monad (liftM, replicateM)


binomialFu :: IO Int
binomialFu = do g <- newIOGenM (mkStdGen 1729)
                x <- liftM sum $ replicateM 1000 $ RF.runRVar (RF.binomial 1400 (0.4 :: Double)) g
                return x

binomialMwc :: IO Int
binomialMwc = do g <- newIOGenM (mkStdGen 1729)
                 x <- liftM sum $ replicateM 1000 $ MWC.binomial 1400 0.4 g
                 return x

normalSamplesCSV :: FilePath
normalSamplesCSV = "normal-samples.csv"

rawDAT :: FilePath
rawDAT = "raw.dat"

normalBenchmarks :: [Benchmark]
normalBenchmarks = [ bench "Binomial fu" $ nfIO binomialFu
                   , bench "Binomial mwc" $ nfIO binomialMwc
                   ]

main :: IO ()
main = do
  let configNormal = defaultConfig {csvFile = Just normalSamplesCSV, rawDataFile = Just rawDAT}
  defaultMainWith configNormal normalBenchmarks

which gives

benchmarking Binomial fu
time                 5.706 ms   (5.680 ms .. 5.730 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 5.746 ms   (5.739 ms .. 5.762 ms)
std dev              28.68 μs   (13.47 μs .. 58.35 μs)

benchmarking Binomial mwc
time                 268.7 μs   (267.9 μs .. 269.9 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 267.7 μs   (267.3 μs .. 268.5 μs)
std dev              1.852 μs   (1.069 μs .. 3.189 μs)

So we know the new binomial sampler in mwc-random is better (CAVEAT: for one given set of parameters) by about x20 (and that is without "squeezing").

@idontgetoutmuch
Copy link
Member Author

@Shimuuar I think this would be better as a discussion rather than an issue. I think you as admin for the repo on GitHub have to do something: https://docs.github.com/en/discussions/quickstart

@Shimuuar
Copy link
Collaborator

Yes new package could be useful. It would give chance to rethink API. statistics take on distribution I think isn't very successful

Discussions could be nice but I don't have admin rights, only commit rights

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants