Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leak on Info pointer from class TTheoryIntermediateCache #35

Open
eatdust opened this issue Jun 16, 2021 · 15 comments
Open

Leak on Info pointer from class TTheoryIntermediateCache #35

eatdust opened this issue Jun 16, 2021 · 15 comments

Comments

@eatdust
Copy link

eatdust commented Jun 16, 2021

Hi there,
tested under gfortran-10.3.0, this "Info" pointer is leaking at each Cls evaluation.

From Calculator_CAMB.f90:

``

subroutine CAMBCalc_GetNewTransferData(this, CMB,Info,Theory,error)
...
class(TTheoryIntermediateCache), pointer :: Info
...
allocate(Info, source=this%DefaultInstance)

``

A possible fix is to keep the same all along the runs with:

if (.not.associated(Info)) allocate(Info, source=this%DefaultInstance)

or, maybe, writing an explicit deallocation in the Clear procedure, which is currently empty?

In GeneralType.f90

``

subroutine TTheoryIntermediateCache_Clear(Info)
class(TTheoryIntermediateCache) Info

end subroutine TTheoryIntermediateCache_Clear

``

Dunno what is best!

@cmbant
Copy link
Owner

cmbant commented Jun 16, 2021

Probably https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94109 ? Have you tried other gcc versions?

@eatdust
Copy link
Author

eatdust commented Jun 16, 2021

I have just checked, gfortran-10.3.0 has the fix of #94109 included. Plus, I do not see any leak on CAMB, which suggests that #94109 is fixed for good, hopefully. I'll try to reduce it though, it is not very clear to me where it gets explicitly or implicitly freed.

@cmbant
Copy link
Owner

cmbant commented Jun 16, 2021

OK thanks. It should be freed in TTheoryIntermediateCache. CAMBdata stored in info only has allocatable components which should be freed automatically (in recent CAMB versions).

@eatdust
Copy link
Author

eatdust commented Jun 16, 2021

Making progress, Info is in fact already associated when entering the CAMBCalc_GetNewTransferData routine the first time. This could explain the leak, the explicit allocate then create a new copy of the Info pointer. So, if I remove completely the allocation, everything works fine as well.
Could it be that it is already allocated in CAMBcalc_SetCurrentPoint?

@cmbant
Copy link
Owner

cmbant commented Jun 16, 2021

Hmm not sure. The idea is that when sampling, you make a new point by coping the current one. This new point has a copy of the original data pointer. So in currentPoint, except at the start, Info should already be assigned to the same info as the old point. GetNewTransferData should then make a new instance for the new point's calculation. But it should then be freed consistently if the point is rejected. Do you find memory grows indefinitely?

@eatdust
Copy link
Author

eatdust commented Jun 16, 2021

Do you find memory grows indefinitely?

Exactly! And quite fast, after a thousand calls to CAMB, this is game-over :)

@cmbant
Copy link
Owner

cmbant commented Jun 16, 2021

Can you reproduce e.g. with ifort or other gfortran version? I would have though a general issue like that would have been raised long ago (though I've not been using cosmomc myself).

@eatdust
Copy link
Author

eatdust commented Jun 17, 2021

Yes, with gfortran-8.5.0. I was able to reproduce.
With the unstable branch of gfortran-11.1.1 20210612, that's actually untestable as I am getting a SIGSEGV at start. This is unrelated though, the SIGSEV error in __fileutils_MOD_writeitemtxt (../camb/forutils/FileUtils.f90:1572).

With ifort 2021.2.0 20210228, it looks fine, memory grows for a while and then it is freed.

I'll be happy to add an explicit deallocate() of the leaking pointer for gfortran, but it is really not clear where it should be freed!

@cmbant
Copy link
Owner

cmbant commented Jun 17, 2021

Hmm, I wonder if a new gfortran memory leak bug has been introduced.

The CAMB mem test passes with trunk GNU Fortran (GCC) 12.0.0 20210613
https://travis-ci.com/github/cmbant/CAMB/jobs/515550950
so probably not directly a problem with CAMB dynamic allocatables.

If it were a CosmoMC leak, it should presumably also be there with ifort.

@eatdust
Copy link
Author

eatdust commented Jun 17, 2021

Yes, I agree, I've passed CAMB under valgrind, it is clean with gfortran.

I'll try to understand in more details what is going on with this Info pointer.
Removing the allocate stops the leak and I do not see any difference in the behaviour of the chains, with or without. It seems that gfortran makes a spurious extra copy each time its encounter this allocate statement. Tough to solve.

@cmbant
Copy link
Owner

cmbant commented Jun 18, 2021

I certainly can't reproduce a gfortran issue on trunk in simple test cases. Perhaps count calls to deallocate and allocate of Info and see if getting imbalanced?

@cmbant
Copy link
Owner

cmbant commented Jun 18, 2021

I don't see any very obvious memory issue in 8.5.0 using defaults with test.ini changed to action=0.

@eatdust
Copy link
Author

eatdust commented Jun 19, 2021

Thank you for looking at it! I've just made the same test, with action=0 on test.ini and with 10.3.0, it looks also fine, the memory remains stable!

In my runs, I was using tensors and plik, so I've made another test with test.ini (and also test_planck.ini) having uncommented

compute_tensors = T
param[r] = 0.03 0 2 0.04 0.04

And there, I do see the memory growing slowly (compiled with -fopenmp and running on 48 cores, no MPI). It grows slow, but once used with -DMPI, it became rapidly problematic on mpirun -np 64.

Anyway, good catch, I'll try to narrow it down if this is really related to tensors.

@cmbant
Copy link
Owner

cmbant commented Jun 30, 2021

No sure why tensors could be leaking memory - did you find anything else?

Why 64 processes - sounds way too much, or is this using polychord plugin or something else?

@eatdust
Copy link
Author

eatdust commented Jul 5, 2021

I've been busy and did not have time to make more progress, but I will.

For 64, I am not looking for getting margestats only, I generate millions of sample to train a NN for other matters, and multiplying the number of chains, and running cosmomc a long time, is doing the job very well, provided I can keep the memory under control :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants