Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 1. #12503

Open
mikiefromhell opened this issue Apr 29, 2024 · 5 comments

Comments

@mikiefromhell
Copy link

Background information

Hello! I am running CP2K (a molecular dynamics simulation software) on a shell connected remotely to a supercomputer. I tried submitting a job today, and it did not quite work.

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

3.1.1

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

This line is part of the sbatch file that I use to run jobs:

module load openmpi/3.1.1

The module is loaded from Discovery on Open On Demand (Northeastern U's supercomputer)

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

/

Please describe the system on which you are running

  • Operating system/version: Windows 11 23H2
  • Computer hardware: 13th gen Interl Core CPU, 16 GB RAM
  • Network type: Home Wifi

Details of the problem

Please describe, in detail, the problem that you are having, including the behavior you expect to see, the actual behavior that you are seeing, steps to reproduce the problem, etc. It is most helpful if you can attach a small program that a developer can use to reproduce your problem.

Note: If you include verbatim output (or a code block), please use a GitHub Markdown code block like below:

shell$ mpirun -n 2 ./hello_world
@mikiefromhell
Copy link
Author

I found this thread and checked my PATH and LD_LIBRARY_PATH
horovod/horovod#133

The LD_LIBRARY_PATH does not exist, but I am not an admin on the server and I do not think that's all there is to the problem because I was able to run a different job a few days ago!

@jsquyres
Copy link
Member

The error message is telling you that your application decided to abort for some reason (i.e., it called the MPI_ABORT API function). I'm unfamiliar with CP2K, so I don't know why it would have done that. You might want to look through the output and see if there's other warning/error messages before the abort message.

Also, Open MPI v3.1.1 is fairly ancient. At a bare minimum, I would suggest upgrading to the latest 3.1.x version (v3.1.6), because it contains bunches of bug fixes beyond 3.1.1.

That being said, 3.1.6 is from March of 2020, and is still pretty ancient. We are unlikely to ever make any more releases in the v3.1.x series.

The most recent version of Open MPI is v5.0.3 -- I'd suggest upgrading to that.

@mikiefromhell
Copy link
Author

Hello @jsquyres Jeff, Thank you for your response!
That was actually the only message in the output and no error file was created. I understand that it is an ancient version, but this server is unfortunately not managed by me and the CP2K package relies on the 3.1.1 version: this is what comes up when I type
module show cp2k
**
image
**
Unfortunately, the most recent version of openmpi I have access to is 4.1.4.

I also tried running a different simulation and I got another MPI error, albeit a different one:

[[57845,1],0]: A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces: Module: OpenFabrics (openib) Host: c0279 Another transport will be used instead, although this may result in lower performance. NOTE: You can disable this warning by setting the MCA parameter btl_base_warn_component_unused to 0. -------------------------------------------------------------------------- -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 1. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them.

@jsquyres
Copy link
Member

jsquyres commented May 7, 2024

With Open MPI v4.1.4, it looks like you got an additional warning but the same underlying error (i.e., the application invoked MPI_ABORT). The CP2K application has chosen to abort; you'll have to look at their docs and/or source code for more information on why the application chose to abort.

I'm afraid we can't help you with whatever environment NEU has setup to run CP2K, nor can we help with CP2K itself -- we're not involved in either of those organizations.

@mikiefromhell
Copy link
Author

Hello Jeff,

I was able to run a few CP2K jobs from a tutorial website - the Shell still outputs MPI errors, but no aborts. I am assuming, like you suggested, that it is a problem with my input files, and not the MPI package. thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants