Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dead coll tuned alltoall mca parameters #12547

Open
burlen opened this issue May 15, 2024 · 8 comments · Fixed by #12551
Open

dead coll tuned alltoall mca parameters #12547

burlen opened this issue May 15, 2024 · 8 comments · Fixed by #12551

Comments

@burlen
Copy link

burlen commented May 15, 2024

Background information

I'm tuning OpenMPI's alltoall on our cluster.

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

v5.0.3

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Compiled from sources, on tag v5.0.3

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

33e93469e1e1f69904ff3e3827394719aa6b3671 3rd-party/openpmix (v5.0.2)
3a70fac9a21700b31c4a9f9958afa207a627f0fa 3rd-party/prrte (v3.0.5)
dfff67569fb72dbf8d73a1dcf74d091dad93f71b config/oac (heads/main)

Please describe the system on which you are running

  • Operating system/version: ubuntu 22.04 lts
  • Computer hardware: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz
  • Network type: infiniband

Details of the problem

The following mca parameters relating to alltoall in coll tuned component are reported by ompi_info but are not used in the code.

coll_tuned_alltoall_small_msg
coll_tuned_alltoall_intermediate_msg
coll_tuned_alltoall_algorithm_segmentsize
coll_tuned_alltoall_algorithm_tree_fanout
coll_tuned_alltoall_algorithm_chain_fanout
coll_tuned_alltoall_large_msg                                                                                                        
coll_tuned_alltoall_min_procs

These may be remnants of a previous implementation, but currently setting these does nothing. Their presence complicates tuning. They should either be removed or at least the description string reported by ompi_info modified so that it's clear that these do nothing.

@janjust
Copy link
Contributor

janjust commented May 16, 2024

Can't remove in v5.0.x but can add a message, and those that are not used, to be removed in main
There is a PR coming in where a parameter or two is used.

@wenduwan
Copy link
Contributor

https://github.com/open-mpi/ompi/pull/12453/files Will use the fanout parameter(there are 2 fanouts).

@burlen
Copy link
Author

burlen commented May 28, 2024

@wenduwan would it make sense to use a different mca parameter name to avoid confusion with the original use of the mca fan out parameters? while these features are not very well documented, I assume it is documented somewhere and could thus be confusing with your new use. Assuming the two implementations are different

@wenduwan
Copy link
Contributor

@burlen I'm actually fuzzy on the difference between fanout vs radix. Do you mind providing clarity on the terminology?

@burlen
Copy link
Author

burlen commented May 30, 2024

@burlen I'm actually fuzzy on the difference between fanout vs radix. Do you mind providing clarity on the terminology?

@wenduwan The Bruck algorithm completes in logarithmic number of rounds. Radix refers to the base of the logarithm. This is the terminology the author of the PR you linked uses for function argument names, and is used in the paper I linked in the review. It is really not about radix vs fanout. The problem is fanout is already in use for a different purpose, which makes overloading it as done in the PR potentially confusing.

@wenduwan
Copy link
Contributor

@burlen I see.

From an engineering perspective, we also need to consider code maintainability. This issue arises from the introduction of additional parameters, which I believe had good reasons at some point - similar to this discussion.

For fan-out vs radix, do you see a case where both can be used together? Is it sufficient to simply document what the parameter means in different algorithms rather than creating an entirely new parameter?

@burlen
Copy link
Author

burlen commented May 31, 2024

For fan-out vs radix, do you see a case where both can be used together?

Since faninout is not even currently used in any of the alltoall implementations, I don't see how it could be

Is it sufficient to simply document what the parameter means in different algorithms rather than creating an entirely new parameter?

That would certainly be helpful. In my opinion it would be better to have the parameter name directly relate to the algorithm. for both maintainability and usability. the name itself can be part of the documentation (helps both users and maintainers). Such naming could even reduce the burden of writing documentation because you could cite the paper for more information

It's kind of a minor thing, I guess I'm a bit hung up on it (apologies!), because I'm finding these tuning features to be somewhat poorly documented and as a result rather difficult to use.

@wenduwan
Copy link
Contributor

wenduwan commented Jun 6, 2024

@burlen Thanks for the followup. In this case I suggest we postpone the decision to introduce a new parameter, and enrich the documentation for now. It is easier to add features than deprecate them afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants