Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use OMPI without LSF integration on LSF #12556

Closed
robertsawko opened this issue May 17, 2024 · 14 comments
Closed

Use OMPI without LSF integration on LSF #12556

robertsawko opened this issue May 17, 2024 · 14 comments
Labels

Comments

@robertsawko
Copy link

I believe this should be relatively simple, but I am struggling to find the right combination of switches.

My target application is quite complex: OpenFOAM, ParaView + Catalyst V2 with OSMESA both using OpenMPI v5. I've used Spack to build it on x86 RHEL7 cluster. Unfortunately, Spack OpenMPI package doesn't support lsf-libdir option which on this cluster is required to correctly build OpenMPI with LSF integration. So I ended up with all my stack built but no LSF integration. Lawless land it seems.

I've already tested my setup for small jobs and now I am about to launch a medium size job: 512 ranks, spanned over 16 nodes, 32 core per node and 1 rank per physical core.

cat \$LSB_DJOB_HOSTFILE | uniq | awk '{print \$1 " slots=32 max_slots=32"}' > myhostfile
mpirun \
    -np 512 \
    --hostfile myhostfile \
    --map-by node \
    --rank-by slot \
    --bind-to core \
    --report-bindings \
    --display-map \
    -wdir $CASE_DIR \
    -x PATH \
    -x LIBRARY_PATH \
    -x LD_LIBRARY_PATH \
         hostname

but I still end up seeing the following error.

prterun was unable to launch the specified application as it encountered an
error:

Error: system limit exceeded on number of files that can be open Node:
sqg6e31

when attempting to start process rank 0.

This can be resolved by setting the mca parameter
opal_set_max_sys_limits to 1, increasing your limit descriptor setting
(using limit or ulimit commands), asking the system administrator for
that node to increase the system limit, or by rearranging your
processes to place fewer of them on that node.

Please advise if there's anything I could improve in my mpirun invocation. The displayed mapping looks correct, but clearly before binding(?) something goes very wrong.

Also, if you think it's impossible to correctly call mpirun for medium and large jobs without LSF integration, then I am happy to focus on fixing the Spack package. I was meaning to do it for a while.

@rhc54
Copy link
Contributor

rhc54 commented May 17, 2024

This has nothing to do with mpirun or binding - the error message is quite specific:

Error: system limit exceeded on number of files that can be open Node:
sqg6e31

You need to increase the limit on the number of files that can be open, just like it says:

This can be resolved by setting the mca parameter
opal_set_max_sys_limits to 1, increasing your limit descriptor setting
(using limit or ulimit commands), asking the system administrator for
that node to increase the system limit, or by rearranging your
processes to place fewer of them on that node.

@robertsawko
Copy link
Author

I have tried to set with --mca opal_set_max_sys_limits 1, but that results in exactly the same message. It just occurred to me that all 512 ranks are still trying to start on one node. On the same cluster, when I compiled OpenMPI manually with LSF integration I didn't have the message so that's why I am going this rabbit hole.

I am now trying to fix LSF integration for OMPI in Spack, but strangely by just adding schedulers=lsf HDF5 no longer wants to compile, so I am chasing this route too now independently.

@robertsawko
Copy link
Author

robertsawko commented May 18, 2024

And I can now also confirm that after --with-lsf-libdir to Spack package.py OpenMPI and compiling with schedulers=lsf. I can just run

...
#BSUB -n 512
#BSUB -R "span[ptile=32] affinity[core(1)]"
...
mpirun hostname

runs just fine - no need to set any mca parameters or changing system limits.

I am trying to fix the packages that broke downstream. I suspect this is because I haven't properly fixed the Spack package, so it's not clear whether I will succeed as this breaks all packages downstream that depend on MPI.

@wenduwan
Copy link
Contributor

@robertsawko Please keep us posted on new issues

@robertsawko
Copy link
Author

Absolutely, I do want to get to the bottom of it. I only just got access to another LSF cluster. My main one is actually down until tomorrow, possibly later, so I couldn't look into it over this week.

@robertsawko
Copy link
Author

Ah, sorry, I should have said - I started a Spack issue on LSF LIBDIR here, but also last week my LSF cluster went into a week long maintenance so I didn't have computer to test on.

@rhc54
Copy link
Contributor

rhc54 commented May 23, 2024

I confess to being puzzled as to how the LSF libdir can impact the MPI stack (outside of mpirun itself). Nothing in MPI depends on or integrates with LSF.

@robertsawko
Copy link
Author

robertsawko commented May 24, 2024

Thanks, @rhc54 - you may be right. Maybe LSF is a red herring... When I compile OMPI manually I add all sort of switches:

--enable-shared --disable-static \\
--enable-mpi-fortran=usempi \
--disable-libompitrace \
--enable-wrapper-rpath \
--with-lsf=\${LSF_LIBDIR%%linux*} \
--with-lsf-libdir=\${LSF_LIBDIR} \
--with-knem=\${knem_dir} \
--with-mxm=/opt/mellanox/mxm \
--with-ucx=$CORE_DIR/ucx/1.4.0 \

and my Spack spec was pretty basic:

openmpi+internal-pmix fabrics=auto schedulers=lsf

So I need to test that. Specifically, I need to test that adding knem and mxm rather than relying on auto works.

@robertsawko
Copy link
Author

Yes, the LSF integration may be a red herring. I am sorry. It looks like the error is caused by adding wdir option. I mis-attributed something again. The actual application I am trying to run is OpenFOAM, but without wdir I was getting

--> FOAM FATAL ERROR :
     Could not find mandatory etc entry (mode=ugo)
     'controlDict'

which I misread as being in the wrong directory. Now I can see clearly that all ranks were indeed starting in the correct working directiroy and this error has something to do with the environment. For instance here they're discussing it in the context of a container and running as root. There's a few posts with people trying to run as root, but that's not the case for me, so I am not sure what I've done wrong here.

I am checking this more carefully now.

@robertsawko
Copy link
Author

robertsawko commented May 28, 2024

Hmm... looks like the source of my problem is some inconsistency of the environment. If I run without -x flags, non-launch nodes are unaware of Spack. If I add the basic ones line PATH and LD_LIBRARY_PATH, OpenFOAM think I am running in root or something equivalent. I am trying to devise a sensible wrapper...

@robertsawko
Copy link
Author

After many trials and not so many tribulations I managed to produce a wrapper which reproduces the Spack environment consistently across all nodes. I am happy for this to be closed, but could you please advise if there's any better way to propagate environment across all nodes? Maybe LSF integration was doing just that?

When I was running this:

source /path/to/spack/share/spack/setup-env.sh
spack env activate openfoam_w_catalyst
mpirun \\
    -np 512 \\
    --hostfile myhostfile \\
    --map-by node \\
    --rank-by slot \\
    --bind-to core \\
        myApp

nothing but the launch node would know about my Spack environment. Adding naively, PATH and LD_LIBRARY_PATH produced a confusion about input files, which lead me to more confusion with -wdir option and too many files being opened supposedly.

@robertsawko
Copy link
Author

robertsawko commented May 28, 2024

Sorry, one more comment as it is pertinent to my original question. I've run some more scripts and I can confirm that with LSF integration I've got launch node environment fully reproduced on all other nodes, whereas without LSF integration PATH et al are set to system defaults. So this has been the source of my misery all along.

@rhc54
Copy link
Contributor

rhc54 commented May 28, 2024

LSF automatically forwards your entire environment. However, ssh does not - so when launching via ssh, your environment will not get forwarded. Easiest way around that is to add the key envars to your login shell script (e.g., .bashrc).

Trying to forward the entire environment under ssh would be problematic as there are limits to the size of the overall ssh string. So the only alternative solution is to ask that the user specify which envars should be forwarded.

@robertsawko
Copy link
Author

robertsawko commented May 29, 2024

I am going to close this as this is really a solved problem (wrapper) and fix the Spack package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants