You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The default dtype of the microbenchmark (OSU) is MPI_CHAR, so count = 32kB, and it triggers this conditional. As a result, it falls back to the basic linear algorithm, which performs very poorly at this larger scale. Because this code is in the redscat_allgather function, the algorithm change is triggered even if I force the algorithm selection using the environment variable.
I am happy to open a PR with a fix, but I not sure what the preferred solution is. I understand the need to take care regarding non-commutative operations, but the first half of the if that is causing my problem seems like it should be removed and replicated in the default algorithm selection logic (if at all). Otherwise, could we add an additional requirement that the count is below some value, to avoid scaling issues?
The text was updated successfully, but these errors were encountered:
The decision to call another algorithm under certain conditions is out of scope in a base function, because technically it shall be done by a higher level decision function (such as ompi_coll_tuned_reduce_scatter_block_intra_dec_dynamic).
Background information
I was investigating an Allreduce performance issue for 64k processes @ 32kB message size.
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
v4.1.6 (reproduced w/ v5.0.3)
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Built from tarball
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.N/A
Please describe the system on which you are running
Details of the problem
I traced it back to this conditional in the redscat_allgather algorithm:
ompi/ompi/mca/coll/base/coll_base_allreduce.c
Lines 993 to 1000 in 11e1d8e
The default dtype of the microbenchmark (OSU) is MPI_CHAR, so
count
= 32kB, and it triggers this conditional. As a result, it falls back to the basic linear algorithm, which performs very poorly at this larger scale. Because this code is in the redscat_allgather function, the algorithm change is triggered even if I force the algorithm selection using the environment variable.I am happy to open a PR with a fix, but I not sure what the preferred solution is. I understand the need to take care regarding non-commutative operations, but the first half of the
if
that is causing my problem seems like it should be removed and replicated in the default algorithm selection logic (if at all). Otherwise, could we add an additional requirement that the count is below some value, to avoid scaling issues?The text was updated successfully, but these errors were encountered: