Skip to content

Getting started with parallelization in Python with mpi4py

Notifications You must be signed in to change notification settings

pikarpov-LANL/tutorial_mpi4py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This is a quick-start guide to mpi4py to run simple scripts, e.g. for making plots in parallel. Setup instructions along with some example scripts are provided in this repo. No sudo required.

Setup

Anaconda

To simplifying installation and to setup our workspace, let's install Anaconda. You can download it straight from the official website of Anaconda.

For example, on Linux, you can download the version from 05.2022 by running below.

wget https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-x86_64.sh

To install it, run:

bash Anaconda3-2022.05-Linux-x86_64.sh

Go through the install process, and make sure to initialize conda. Afterwards, to activate conda, run:

source ~/.bashrc

Lastly, create a new virtual environment to mess around:

conda create -n py310 python=3.10

The above creates an environment named py310 with a python version 3.10. The last thing you need to do is to activate it.

conda activate py310

To the right of your login name you should see the name of the environment you are in as such: (py310)login@host

Python

Make sure you are running python 3.5+ (3.9+ recommended) for this tutorial. That said, mpi4py should work on python 2.7 just fine.

OpenMPI

Install OpenMPI & mpi4py via conda from channel conda-forge. It is a community-led conda repo that has the most up-to-date package versions. The main anaconda channel doesn't even support python 3.9 which is shipped standard with their latest release.

conda install -c conda-forge openmpi=4.1.4=ha1ae619_100

You can also build OpenMPI from source by following their instructions

⚠️ Note the specific build hash (ha1ae619_100, from July, 2022) when installing openmpi. The "stable" 4.1.3 & 4.1.4 fail to install essential libraries, making MPI unusable.

mpi4py

conda install -c conda-forge mpi4py

If you don't want to use conda, or would like to use MPICH/Microsoft MPI, follow mpi4py's Documentation.

numpy & matplotlib

You might need to install numpy & matplotlib into your virtual environment:

pip isntall numpy
pip install matplotlib

Troubleshoot

During the workshop, folks on Mac had trouble installing OpenMPI from conda-forge channel. If you are exeprience a similar problem, you can default to python 3.8 and run the following:

conda create -n py38 python=3.8
conda activate py38
conda install openmpi

Running Examples

Check the CPUs

To run an example in parallel, first you should check how many cores (not threads) are available on your machine. You can do that via

lscpu

Look at the 12th line form the top labeled Core(s) per socket:

⚠️ Cores and Threads are different; modern CPUs typically have 2 threads per core.

Run

To run in parallel, you need to specify how many cores to give to the process. The following command will run the script on 4 cores.

mpirun -n 4 python simple_demo.py

Below is a table describing each demo provided in this tutorial.

Demo Description
simple_demo basic test of correct installation
plot_demo independent for loop parallelization with plotting
comm_demo communication demo presenting bcast, scatter, gather, and Barrier

If you want to parallelize a code with only some independent loops, use the comm_demo for guidence.

Additional Info

For the full list of commands, refer to official mpi4py documentation

I found mpitutorial.com to be quite useful. It is not specifically on mpi4py, but rather MPI in general. In aprticular, during the workshop I presented the diagrams from their tutorials on Barrier & Bcast and Scatter & Gather.

About

Getting started with parallelization in Python with mpi4py

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages