Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak with simple vector code #60

Open
rpeszek opened this issue Nov 11, 2017 · 2 comments
Open

Memory leak with simple vector code #60

rpeszek opened this issue Nov 11, 2017 · 2 comments
Labels

Comments

@rpeszek
Copy link
Contributor

rpeszek commented Nov 11, 2017

Vector package is heavily optimized with Core rewrite rules. This code executes in milliseconds and small amount of constant space in GHC:

import qualified Data.Vector as V

sq x = x * x
bigSumVec = V.sum $ V.map sq $ V.enumFromTo  1 (100000000 :: Int64)

But in Eta it runs forever and eventually produces OutOfMemory exception. I have looked at the patch for vector and I did not see any aggressive removal of {# RULES ... #}.

Is this a case of something special that Eta currently does?

I believe that all vector package optimization is in the Core layer and am surprise at such a big difference.
Thank you for any help answering this question.

@rpeszek rpeszek changed the title Question of bug about performance of vector Question or a bug about performance of vector Nov 11, 2017
@rahulmutt
Copy link
Member

rahulmutt commented Nov 13, 2017

Thanks for the report!

I did a brief investigation by compiling this program with -ddump-stg and decompiling the generated class files as well. It turns out the optimizations are happening and the intermediate vectors have been fused away and there's a nice tight loop that calculates the sum via an accumulator variable.

What's interesting is that when I checked out VisualVM, the old generation for the GC was through the roof - 1GB! This means that there's a memory leak going on somewhere. The code is nice and small so it should be easy to investigate. Thank you for constructing a minimal example! I'll take a look at this.

@rahulmutt rahulmutt added the bug label Nov 13, 2017
@rahulmutt rahulmutt changed the title Question or a bug about performance of vector Memory leak with simple vector code Nov 13, 2017
@rahulmutt
Copy link
Member

Some notes on the investigation for this:

  1. I did memory profiling in VisualVM and discovered that the entire list from 1 to 100,000,000 is retained along with all the intermediate thunks used to construct it. This list was created as a result of fusing the vector operations and lifting out the resulting list to the top-level, creating a CAF (constant applicative form) or a globally shared thunk.

    In Eta, globally shared thunks are referenced through a static field of the class which corresponds to the module for which that thunk was declared (either directly or indirectly). Thus, the head of the list is retained throughout the duration of the program, preventing the GC from collecting the front part of the list. The way to get around this is to make the static reference to the CAF weak so that it can be GC'd when it's unused and re-created again as necessary. See Compile CAFs to WeakReferences  eta#554.

    This problem doesn't affect GHC since its GC has special handling for top-level closures and thunks which clears out unused heap objects.

  2. I turned on -fno-full-laziness to see if I could prevent the optimizer from lifting the list out as a CAF. The optimizer did not lift the CAF this time, but unfortunately, even in this case, it turns out there's head retention going on where the head of the list is retained by a local variable in the function that computes the initial [1..100,000,000] list. This problem can be fixed by Aggressively clear local references eta#79.

Two bugs with one program, great work @rpeszek! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants