-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maximum matrix sizes #4268
Comments
Are you using the "performance" examples or is this the regular |
I have tried with both, and I'm having issues in either case. Edit: But just to clarify, the particular example I gave with the 17 million unknowns was on the regular |
Can you step through on a debugger to see what is going on? My suspicion, if I understand correctly that the 17 million dof case is run on a single MPI rank, is that there is a quadrature data array being allocated (for example, for the element Jacobians) which might be larger than 2B entries and you are overflowing on the index. This happens for 3D Jacobians when |
Can you provide some more information, e.g. which initial mesh you are using, and how many MPI ranks? If I understand your configuration correctly, the serial mesh is refined until it has 10,000 elements, and then it is partitioned, and 7 parallel refinements are performed. For a 2D quad mesh, this will result in Also, for better parallel load balancing, it is probably better to partition the mesh as much as possible in serial (before running out of memory on a single rank), and only then partition and switch to parallel refinements. The parallel refinements do not do any repartitioning or load balancing. |
@Heinrich-BR, can you post the exact command line you use (and any modifications you made to |
Hi everyone! Thank you for your support! Let me try to answer everything.
That's very interesting @sebastiangrimberg , and it does sound very likely. I've stepped through with a debugger before which is how I found where the issue was happening to start with, but I'll try again and keep out an eye for the quadrature data array.
I am using the
Not exactly, I set the number of serial refinements to be -1 (i.e. no serial refinement at all), so all of the refinement is parallel. I don't think the refinement being serial or parallel makes any difference in the case with 1 MPI rank anyway. The important part is that starting from the original mesh, there were 7 refinements in total.
Of course! I'm using the latest version of MFEM (as of this week), so commit
With this, I ran the example with the command
Hopefully this helps you reproduce the error! Thank you everyone for your support and have a great weekend! |
Just as an update regarding this, I've looked into it with a debugger again and retrieved some numbers:
Given this is almost twice as much as But of course, I'm just speculating here in case this is indeed the issue. It's quite possible I've missed something. Let me know what you find! |
Hi @Heinrich-BR, Sorry it took so long to get back to you. It is important to note that you are using the Because of the way MFEM's full and partial assembly work, the geometric factors (including the Note, however, that for higher-order problems, the ratio between number of quadrature points and number of degrees of freedom approaches 1. For |
Hi @pazner, thank you for the response! Indeed it is the case that I'm using the If I put together everything suggested here (and thank you everyone for the very helpful comments), by splitting the problem into many MPI tasks, increasing the order of the polynomials, and using partial assembly, I can run the problem on 4 GPUs with 80GB memory each, and get to about 57 million global unknowns on |
Note that the default quadrature rule in example 1 uses To change the default quadrature, you can replace the line a.AddDomainIntegrator(new DiffusionIntegrator(one)); with something like this: auto diff_integ = new DiffusionIntegrator(one);
const int integ_order = 2*fec->GetOrder()+1;
const Geometry::Type geom = pmesh.GetElementGeometry(0);
diff_integ->SetIntegrationRule(IntRules.Get(geom, integ_order));
a.AddDomainIntegrator(diff_integ); There are ways to get around this limitation, e.g. by assembling the element matrices in batches, however, we have not had practical interest from users in running problems of such size per MPI rank. In many cases, the practical problem sizes per GPU are around a few millions of unknowns. Bigger problems are just solved on more MPI ranks, i.e. on more GPUs. If you are interested in pushing the limit, we can discuss ways to support bigger problems per GPU in MFEM. |
By the way, computing in batches is something we want to incorporate in MFEM since it will allow us to work with mixed meshes (meshes with different types of elements, e.g. triangles + quads) as well as FE spaces with variable polynomial orders. |
Hello MFEM developers,
I'm testing out MFEM's scaling on large clusters and for that, I'm pushing some of the examples to see how big they can be made before they break, simply by parallel refining the mesh further and further. However, I'm noticing that in general, they stop working about one or two orders of magnitude before I would expect memory issues or integer overflow to cause problems.
For instance, take the example
ex1p
. If you set the serial refinement level to -1 and the parallel refinement level to 7, the example ends up with about 17 million unknowns and segfaults in themfem::internal::quadrature_interpolator::TensorDerivatives
function. With a parallel refinement level of 6, the example runs normally to the end. I know that it is not a matter of running out of memory since the cluster I am using has more than enough for problems much larger than this. I have tried building MFEM and its dependencies with 64-bit integers and with mixed integers, but nothing seems to allow me to go further than this. Splitting the problem into many MPI ranks does help me go one parallel refinement level further before breaking again, but it would require a very large number of ranks to reach the problem sizes I am interested in, and in principle it should be doable with only a few.I would like to ask, then, is this a known limit for MFEM or is there perhaps some build configuration that I'm missing?
The text was updated successfully, but these errors were encountered: