Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first benchmark is expensive #273

Open
jwaldmann opened this issue Mar 26, 2023 · 5 comments
Open

first benchmark is expensive #273

jwaldmann opened this issue Mar 26, 2023 · 5 comments

Comments

@jwaldmann
Copy link

jwaldmann commented Mar 26, 2023

I expect identical output for the three benchmarks, but the first one (consistently) needs double time. (Possibly related: #60)

import Criterion.Main

main = defaultMain
  [ bench "1"   $ nf length [1 .. 10^7 :: Int ]
  , bench "2"   $ nf length [1 .. 10^7 :: Int ]
  , bench "3"   $ nf length [1 .. 10^7 :: Int ]
  ]


$ cabal run
benchmarking 1
time                 38.37 ms   (38.26 ms .. 38.50 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 38.67 ms   (38.53 ms .. 38.97 ms)
std dev              387.9 μs   (201.4 μs .. 535.6 μs)

benchmarking 2
time                 20.21 ms   (20.13 ms .. 20.31 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 20.10 ms   (20.03 ms .. 20.17 ms)
std dev              175.3 μs   (144.9 μs .. 234.5 μs)

benchmarking 3
time                 20.77 ms   (20.43 ms .. 21.09 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 20.24 ms   (20.14 ms .. 20.41 ms)
std dev              306.0 μs   (225.8 μs .. 397.1 μs)

For criterion-1.6.0.0, with ghc-8.10.7 .. ghc-9.6.1

@treeowl
Copy link

treeowl commented Mar 26, 2023

My guess is that sharing is the problem. Do things improve if you write it like this?

main = defaultMain
  [ bench "1"   $ nf length [1 .. 10^7 + 3 :: Int ]
  , bench "2"   $ nf length [1 .. 10^7 + 2 :: Int ]
  , bench "3"   $ nf length [1 .. 10^7 + 1 :: Int ]
  ]

@jwaldmann
Copy link
Author

no change, see #274

@jwaldmann
Copy link
Author

jwaldmann commented Mar 26, 2023

Also, same behaviour when I write it like this

main = do
  let {-# noinline xs #-}
      xs = [1 .. 10^7 :: Int ]
  print $ sum xs -- to make sure it's evaluated
  defaultMain
    [ bench "1" $ nf length xs
    , bench "2" $ nf length xs
    , bench "3" $ nf length xs
    ]

@turion
Copy link

turion commented Jan 27, 2024

@treeowl I have the feeling that your analysis might be right. But why would your example reduce sharing? It still has 10^7 as a common subexpression, which might get shared.

@jwaldmann also, your example has the same expression nf length xs three times, couldn't it be that this common expression is shared?

(I might be totally wrong about the kind of optimizations that may be at play here.)

@turion
Copy link

turion commented Jan 27, 2024

I think I can reproduce, but the effect is not as strong as with your machine:

benchmarking 1
time                 44.11 ms   (43.78 ms .. 44.49 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 44.64 ms   (44.35 ms .. 45.42 ms)
std dev              897.5 μs   (361.0 μs .. 1.594 ms)

benchmarking 2
time                 39.32 ms   (38.54 ms .. 40.16 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 38.66 ms   (38.24 ms .. 39.24 ms)
std dev              1.025 ms   (750.5 μs .. 1.409 ms)

benchmarking 3
time                 38.11 ms   (37.18 ms .. 38.88 ms)
                     0.998 R²   (0.994 R² .. 0.999 R²)
mean                 40.92 ms   (39.81 ms .. 42.66 ms)
std dev              2.730 ms   (1.797 ms .. 3.797 ms)
variance introduced by outliers: 25% (moderately inflated)

I changed the code a bit to try and avoid the problem:

main = defaultMain
  [ bench "1"   $ nf length [1 * 10^7 .. 2 * 10^7 :: Int ]
  , bench "2"   $ nf length [3 * 10^7 .. 4 * 10^7 :: Int ]
  , bench "3"   $ nf length [5 * 10^7 .. 6 * 10^7 :: Int ]
  ]

But still the effect is visible:

benchmarking 1
time                 44.98 ms   (44.13 ms .. 45.82 ms)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 44.50 ms   (44.32 ms .. 44.82 ms)
std dev              456.8 μs   (272.8 μs .. 704.0 μs)

benchmarking 2
time                 38.68 ms   (37.49 ms .. 40.20 ms)
                     0.998 R²   (0.996 R² .. 1.000 R²)
mean                 38.19 ms   (37.84 ms .. 38.79 ms)
std dev              956.1 μs   (507.1 μs .. 1.306 ms)

benchmarking 3
time                 36.98 ms   (35.67 ms .. 37.71 ms)
                     0.997 R²   (0.989 R² .. 1.000 R²)
mean                 38.67 ms   (37.92 ms .. 40.42 ms)
std dev              2.141 ms   (829.3 μs .. 3.583 ms)
variance introduced by outliers: 19% (moderately inflated)

So I guess my comments about sharing are incomplete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants