Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network size exceeds the DRAM capacity and program gets killed when exporting the network with nrnbbcore_write #943

Closed
HolyLow opened this issue Jan 28, 2021 · 7 comments · May be fixed by neuronsimulator/ringtest#18
Labels

Comments

@HolyLow
Copy link

HolyLow commented Jan 28, 2021

I am trying to export a large network with nrnbbcore_write, but the program gets killed because it requires more memory than the machine's DRAM could provide.
So if the network size grows so large that it can not be generated by a single machine, what should I do to support such a large network with neuron?
In simulation phase, I could use CoreNeuron to distribute the simulation to a bunch of machines. But in network exportion phase (with nrnbbcore_write), is it possible to distribute the network exportion procedure to different machines? How could I realize that?

@HolyLow HolyLow changed the title Network size exceeds the DRAM capacity and program get killed when exporting the network with nrnbbcore_write Network size exceeds the DRAM capacity and program gets killed when exporting the network with nrnbbcore_write Jan 28, 2021
@nrnhines
Copy link
Member

This is the original use case for CoreNEURON (i.e. the model is too large for NEURON to build at one time.) CoreNEURON requires 7-fold less memory than CoreNEURON for large models. At least that was the case a few years ago. Since then most of the effort has gone into performance improvements. @pramodk can speak to the most current memory usage results. Anyway, the strategy is to have NEURON build a sequence of model subsets and generate the files for each subset,destroy the subset, and go on to the next subset in the sequence. It is up to you how many subsets to divide the model. On a parallel machine, setup efficiency is best if the model is divided into at least nhost subsets and load balance may be best served if it is a multiple of nhost. This is a fairly straightforward NEURON programming problem as most parallel models alrready are cell gid based in terms of distribution on the machine and a process generally only creates a model subset based on its list of gids. Whether you create subsets of size of a single cell or a million cells is up to you and memory resource. The only issue that is a little out of the run of the mill is the destruction of the model after writing its files. But the key is to first release all the gids with pc.gid_clear(), then destroy the netcons, then the cells.

@pramodk
Copy link
Member

pramodk commented Jan 28, 2021

So if the network size grows so large that it can not be generated by a single machine, what should I do to support such a large network with neuron?
In simulation phase, I could use CoreNeuron to distribute the simulation to a bunch of machines.

@HolyLow : before going int o details, just a naive question to clarify : you wrote that the model is large that can fit on a single machine but could use CoreNeuron to distribute the simulation to a bunch of machines.

My question is : if you have multiple machines, you can run NEURON also on multiple machines to generate the model and then run CoreNEURON also on the same number of machines? Is this how you are running now?

From the wording I got the impression that you run NEURON on a single machine and then run CoreNEURON on single or multiple machine. If you could clarify this then that will be helpful.

@HolyLow
Copy link
Author

HolyLow commented Jan 29, 2021

@pramodk Yes, currently I am running NEURON on a single machine and CoreNEURON on multi-machines for some reason. So are you suggesting that the NEURON exportion procedure could also be carried out on multi-machines, and if I applied it to multi-machines, the memory occupation problem could be solved?
@nrnhines Could you kindly provide me some material, such as a manual or an example, to guide me how to achieve dividing the model into sub-models and generate one-by-one each time?

@pramodk
Copy link
Member

pramodk commented Jan 31, 2021

So are you suggesting that the NEURON exportion procedure could also be carried out on multi-machines, and if I applied it to multi-machines, the memory occupation problem could be solved?

Yes. Are you running NEURON with MPI already or just threads? Like CoreNEURON, you can also run NEURON on multiple compute nodes / machines and then there will be more memory available to finish model building step.

P.S. Michael's mentions another option where only part of the model can be setup one at a time but we don't have public example yet. First you can try multiple machines option mentioned above.

@nrnhines
Copy link
Member

nrnhines commented Feb 3, 2021

Is it the case that your model setup on an mpi cluster does not need global collective communication. I.e. that one can even envision building each subset of the model, writing the files, and destroying the subset, without requiring that the entire model exist at once? Anyway, one strategy is

cvode.cache_efficient(1)
gidgroups = [h.Vector() for _ in range(nsubset)] # used to write files.dat at end
for isubset in range(nsubset):
  gids = range(isubset, ncell, nsubset) # round robin distribution, but use whatever you prefer
  build_subset(gids) # just like a single rank on an nhost cluster
  pc.bbcorewrite("./coredat",  gidgroups[isubset])
  teardown()

def teardown():
  pc.gid_clear()
  # delete your NetCons list
  # delete your Cells list
  assert (h.List("NetCon").count() == 0)
  assert (len([s for s in h.allsec()]) == 0)

# write out the files.dat file (see required format at https://neuronsimulator.github.io/nrn/py_doc/modelspec/programmatic/network/parcon.html#ParallelContext.nrnbbcore_write

I did not execute so there may be syntax errors but the idea is sound. I need to follow through with a complete example for the ringtest or some other standard example model to be sure I got it right.

@alexsavulescu
Copy link
Member

@nrnhines do we still need to merge neuronsimulator/ringtest#18 ? Looks like this issue can be closed following #964.

@nrnhines
Copy link
Member

@alexsavulescu

still need to merge neuronsimulator/ringtest#18 ?

I believe we do. https://github.com/neuronsimulator/ringtest/pull/18/files has the test_submodel.py line 39

    pc.nrncore_write("./coredat",  isubmodel != 0)

that makes use of #964
So it is basically a test of the #964

@HolyLow HolyLow closed this as completed Dec 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants