Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling issue run openmp on a cluster #12511

Open
AIBSCT80 opened this issue May 1, 2024 · 4 comments
Open

Scaling issue run openmp on a cluster #12511

AIBSCT80 opened this issue May 1, 2024 · 4 comments
Labels

Comments

@AIBSCT80
Copy link

AIBSCT80 commented May 1, 2024

I am trying to run hybrid mpi/open mp on a cluster using

' salloc -p compute-grantley -n 1 --exclusive mpirun --mca mtl psm ./test

Unfortunately, for 1 threads, I need

Multithread wall clock: 3.409818e-02 in threads: 0

for 6 threads I need

Multithread wall clock : 2.346277e-02 in threads: 0
Multithread wall clock : 1.946653e-02 in threads: 3
Multithread wall clock : 3.420253e-02 in threads: 1
Multithread wall clock : 2.745595e-02 in threads: 4
Multithread wall clock : 3.047117e-02 in threads: 2
Multithread wall clock : 1.347380e-02 in threads: 5

It seems to me the multithreads do not play the role for this case.
I am afraid the setting in salloc is not proper. But I am not sure
about this.

@jsquyres
Copy link
Member

jsquyres commented May 2, 2024

I'm afraid that there isn't enough information here to definitively say what is going on.

  • Your application might well be bound to a single core; running multiple threads bound to that one core may cause contention.
  • Your application may simply not be scalable -- i.e., the overhead of splitting the work up into multiple threads and then gathering all the results may overwhelm the benefit of having multiple workers
  • You are testing over an incredibly small amount of work. Increase your workload to get more realistic results (e.g., if the overheads of splitting / gathering are [more or less] fixed, increasing the amount of work for each thread can dramatically improve the efficiency).

@AIBSCT80
Copy link
Author

AIBSCT80 commented May 3, 2024

@jsquyres thanks. it seems to me the threads are binded to one core. There is no scalability.

@jsquyres
Copy link
Member

jsquyres commented May 3, 2024

I'm not quite sure how to parse your reply.

You didn't provide answers to the templated questions in the github issue, so I don't know what version you're running. If it's Open MPI v5, you can see https://docs.open-mpi.org to see how to set process binding.

Does that resolve your question?

@rhc54
Copy link
Contributor

rhc54 commented May 3, 2024

The problem lies in the salloc cmd line - you asked for 1 cpu, and that's what you were given.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants