Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RF: configure num_threads==-1 as the value to use all cores #2352

Merged
merged 16 commits into from
Apr 25, 2021

Conversation

drombas
Copy link
Contributor

@drombas drombas commented Apr 6, 2021

Related to #2300 and a continuation of #2341.

Proposed Changes

  • Configure num_threads<=0 as the option to use all cores across the codebase
  • Modify the name of the argument num_proceses --> num_threads in reslice.py and test_reslice.py
  • Delete the num_threads argument where it is not used

Important point

Unlike in #2341, most of the functions edited here used all cores by default (using None) so, to keep this behavior, I set the new default to 0. This implies that:

  1. The number of used cores by default differs between functions. Does it make sense to use all cores by default in some functions and only 1 in others?
  2. This PR modifies the default value of num_threads in several functions, which was one of the main concerns discussed in NF: Add "None" options in the CLIs #2300. What do you think @jhlegarreta,@skoudoro?

@codecov
Copy link

codecov bot commented Apr 6, 2021

Codecov Report

Merging #2352 (1e439ce) into master (5394ac5) will decrease coverage by 6.15%.
The diff coverage is 78.29%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2352      +/-   ##
==========================================
- Coverage   91.38%   85.23%   -6.16%     
==========================================
  Files         254      126     -128     
  Lines       33851    16562   -17289     
  Branches     3569     2681     -888     
==========================================
- Hits        30936    14117   -16819     
+ Misses       2111     1759     -352     
+ Partials      804      686     -118     
Impacted Files Coverage Δ
dipy/data/__init__.py 81.18% <ø> (ø)
dipy/denoise/nlmeans.py 100.00% <ø> (ø)
dipy/denoise/non_local_means.py 100.00% <ø> (ø)
dipy/reconst/csdeconv.py 86.79% <ø> (-1.26%) ⬇️
dipy/reconst/shm.py 93.06% <ø> (-0.04%) ⬇️
dipy/workflows/base.py 76.15% <ø> (ø)
dipy/workflows/io.py 74.13% <ø> (ø)
dipy/workflows/mask.py 94.44% <ø> (ø)
dipy/workflows/stats.py 84.80% <0.00%> (-1.01%) ⬇️
dipy/workflows/tracking.py 96.51% <ø> (ø)
... and 181 more

@grlee77
Copy link
Contributor

grlee77 commented Apr 7, 2021

As an alternative, I would also consider using num_threads=-1 to mean the maximum number of workers (similar array[-1] giving th last element in an array). Similarly, num_threads=-3 would be two less workers than the maximum. That is the approach taken for the workers argument in SciPy functions (example) and for the n_jobs argument in joblib.Parallel.

Unfortunately there is not good consensus on either this behavior or the name for the num_threads/workers/n_jobs argument across scientific Python libraries though! (see some discussion in scikit-image/scikit-image#4876)

@jhlegarreta
Copy link
Contributor

jhlegarreta commented Apr 8, 2021

Thanks for taking care of this @drombas !

Configure num_threads<=0 as the option to use all cores across the codebase

IMO using two (or more) different values to mean the same thing is misleading.

The number of used cores by default differs between functions. Does it make sense to use all cores by default in some functions and only 1 in others?

At first sight, this looks inconsistent/undesirable to me, but have not investigated the reasons, if any. Maybe others can elaborate on this.

This PR modifies the default value of num_threads in several functions, which was one of the main concerns discussed in #2300. What do you think @jhlegarreta,@skoudoro?

Although not desirable, I'd definitely be for it if the new values are the ones that make sense/are sensible defaults.

Interesting what @grlee77 says #2352 (comment) . That also broadens the scope of our issue. As for the terms used, I wouldn't know to tell which term is more accurate/honors its purpose, but have seen mixed use in the past elsewhere with the terms process, thread, job, worker.

Copy link
Member

@skoudoro skoudoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @drombas,

Thank you for doing this again!

After all comments and more thinking, see below some suggestions:

Modify the name of the argument num_proceses --> num_threads in reslice.py and test_reslice.py

you should keep num_processes here. This is different than num_threads. To simplify it means we spawn processes (run new python interpreter to parallelize).

Delete the num_threads argument where it is not used

you will have to keep it and start a deprecation cycle to warn the users

The number of used cores by default differs between functions. Does it make sense to use all cores by default in some functions and only 1 in others?

To me it makes sense. It depends a lot on the algorithms. Some of them are greedy and we do not want to kill/freeze the laptop of the user (they will blame the algo which is wrong)

This PR modifies the default value of num_threads in several functions, which was one of the main concerns discussed in #2300. What do you think @jhlegarreta,@skoudoro?

After @jhlegarreta and @grlee77 comments, I recommend that you keep the default value at None or the value indicated, and inside each function just initialize the num_threads by doing:

num_threads = num_threads or -1

Does it make sense to you @drombas?

dipy/align/reslice.py Outdated Show resolved Hide resolved
dipy/align/reslice.py Outdated Show resolved Hide resolved
@drombas
Copy link
Contributor Author

drombas commented Apr 9, 2021

Thank you all for your comments! TBH I don't have a strong opinion on this kind of conventions.

On your points @skoudoro:

you should keep num_processes here. This is different than num_threads. To simplify it means we spawn processes (run new python interpreter to parallelize).

Sorry for that, thought it was just a different naming for the same.

you will have to keep it and start a deprecation cycle to warn the users

Ok, I'll keep those.

I recommend that you keep the default value at None or the value indicated, and inside each function just initialize the num_threads by doing:

num_threads = num_threads or -1

Just to clarify, you mean we keep None as default:

def anyFunction(...,num_threads=None):

and then inside the function we initialize it to -1 or 1 (depending on the case):

if num_threads is None:
    num_threads = -1

It sounds reasonable as it doesn't change the default but incorporates -1 as the option for all cores.

As for selecting between threads,jobs,workers,... we could leave it for another issue/discussion ans stick to num_threads for the moment.

@skoudoro
Copy link
Member

skoudoro commented Apr 9, 2021

It sounds reasonable as it doesn't change the default but incorporates -1 as the option for all cores.

Exactly and this num_threads = num_threads or -1 is equivalent to

if num_threads is None:
    num_threads = -1

@drombas drombas changed the title RF: configure num_threads<=0 as the value to use all cores RF: configure num_threads==-1 as the value to use all cores Apr 13, 2021
@drombas
Copy link
Contributor Author

drombas commented Apr 14, 2021

Before reviewing the code, please notice that I added some extra tests to account for invalid num_threads values.

Could we restart those failing tests? (All related tests passed locally )

@jhlegarreta
Copy link
Contributor

jhlegarreta commented Apr 14, 2021

Thanks for the effort @drombas. Restarted the tests.

Copy link
Contributor

@jhlegarreta jhlegarreta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick look at the changes: the docstrings and the default values in the method signatures are contradicting, e.g.:

def _bundle_minimum_distance_matrix(double [:, ::1] static,
                                    double [:, ::1] moving,
                                    cnp.npy_intp static_size,
                                    cnp.npy_intp moving_size,
                                    cnp.npy_intp rows,
                                    double [:, ::1] D,
                                    num_threads=None):
(...)
    num_threads : int, optional
        Number of threads. If -1 (default) then all available threads will be
        used.

Default is None. Have not looked at all signatures, but I had a look at a few of them, and all them fall into this contradiction. I'd dare to say that for people that will be looking at the documentation this is quite confusing, and IMHO it is as confusing from a developer's point of view.

@drombas
Copy link
Contributor Author

drombas commented Apr 14, 2021

Thanks @jhlegarreta !

Default is None. Have not looked at all signatures, but I had a look at a few of them, and all them fall into this contradiction. I'd dare to say that for people that will be looking at the documentation this is quite confusing,

The intention was to keep None as the default but also tell the user that None in practice behaves as -1 (all cores). Any suggestion for a less confusing docstring ?

We could also return to the original plan and get rid of None by setting -1 as the default in the method signature.

@jhlegarreta
Copy link
Contributor

jhlegarreta commented Apr 14, 2021

The intention was to keep None as the default but also tell the user that None in practice behaves as -1 (all cores). Any suggestion for a less confusing docstring ?

I still believe that using two values with the same meaning is confusing, and this means that I'd be revisiting the use of None in the CLI that triggered the issue this PR tries to fix. I think @grlee77 's proposal #2352 (comment), beyond the terminology, would not have any such contradiction.

Also, it looks like every method involved is forced to do its if/else block or implement its own logic to potentially arrive at the same conclusion (provided that the convention is clear); I'd have one single utility function named e.g. determine_threads(num_threads), compute_threads(num_threads) or similar, that is defined at a single place, and shared and used across all involved methods. Might not be straightforward, and might need some deprecation cycle if the logic changes for some methods, but from that moment on, we'd only need to have a look at/change things at a single place (less prone to errors, etc).

Sorry, but I feel that I cannot comment further being still not convinced by the solution.

Thanks for the effort @drombas.

@skoudoro
Copy link
Member

this means that I'd be revisiting the use of None in the CLI that triggered the issue this PR tries to fix

This PR does not try to fix this anymore @jhlegarreta. it tries to standardize how we set up num_threads since there are different rules in each function. Also, it tries to keep the backward compatibility with the None so it is normal that 2 values will have the same meaning.

I think @grlee77 's proposal #2352 (comment), beyond the terminology, would not have any such contradiction.

that's why, the first step here is -1 for all cores. Then, a new PR will be needed to deprecate None in this specific case. (concerning the other case, we need to manage None at the worflow level). The proposal does not say what is the behavior when you have 0 or None as a parameter. Should we raise an error? or interpret it as all_cores which means 3 values for the same behavior (-1, 0, None). I need to look at scikit-learn code or maybe @grlee77 have an answer.

also, it looks like every method involved is forced to do its if/else block or implement its own logic to potentially arrive at the same conclusion (provided that the convention is clear); I'd have one single utility function named e.g. determine_threads(num_threads), compute_threads(num_threads) or similar, that is defined at a single place, and shared and used across all involved methods

I agree with this point. could you create a function @drombas in dipy.utils.omp.pyx? something like determine_num_threads() as @jhlegarreta propose and then use it everywhere. Would be easier to maintain in the future. Thanks a lot!

@skoudoro
Copy link
Member

@jhlegarreta @drombas @grlee77 : More details about their rules below or in this link. I am ok to follow the same rules

  • For n_threads = None,
    • if the OMP_NUM_THREADS environment variable is set, return
      openmp.omp_get_max_threads()
    • otherwise, return the minimum between openmp.omp_get_max_threads()
      and the number of cpus, taking cgroups quotas into account. Cgroups
      quotas can typically be set by tools such as Docker.
      The result of omp_get_max_threads can be influenced by environment
      variable OMP_NUM_THREADS or at runtime by omp_set_num_threads.
  • For n_threads > 0, return this as the maximal number of threads for
    parallel OpenMP calls.
    • For n_threads < 0, return the maximal number of threads minus
      |n_threads + 1|. In particular n_threads = -1 will use as many
      threads as there are available cores on the machine.
  • Raise a ValueError for n_threads = 0.

@jhlegarreta
Copy link
Contributor

We might want to check how the above (#2352 (comment), #2352 (comment)) would work when calling the CLIs, but looks like a step forward.

@drombas
Copy link
Contributor Author

drombas commented Apr 15, 2021

I agree with this point. could you create a function @drombas in dipy.utils.omp.pyx? something like determine_num_threads() as @jhlegarreta propose and then use it everywhere.

okey, let's try that. I think we already have something similar in omp.pyx that we can adapt to the scikit-learn logic. I will give it a try during next days.

@skoudoro
Copy link
Member

Hi @drombas,

Do you think you will have time to finish this PR before Friday and the new DIPY release? or should I move this PR for the next release cycle in June?

Thank you for your feedback

@drombas
Copy link
Contributor Author

drombas commented Apr 20, 2021

Hi @skoudoro,

By tomorrow I should have finished the changes so, if it is ok we can decide tomorrow.

@skoudoro
Copy link
Member

sounds like a plan 👍🏾 . no problem

Copy link
Member

@skoudoro skoudoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice work @drombas. Thanks a lot for that.

Overall, it looks good. I still need to look at it more carefully. See below some comments.

Also, Can you add your docstring as a note on the [api_change.rst](https://github.com/dipy/dipy/blob/master/doc/api_changes.rst). You can create a section for DIPY 1.4.1.

Thanks!

dipy/utils/multiproc.py Outdated Show resolved Hide resolved
dipy/utils/omp.pyx Outdated Show resolved Hide resolved
dipy/workflows/reconst.py Outdated Show resolved Hide resolved
dipy/workflows/reconst.py Outdated Show resolved Hide resolved
@skoudoro skoudoro added this to the 1.4.1 milestone Apr 21, 2021
@drombas
Copy link
Contributor Author

drombas commented Apr 21, 2021

Thanks for the comments @skoudoro.

In summary, the selection of the number of cores is now centralized in two files:

  • omp.pyx: for OpenMP parallelization
  • multiproc.py: for multiprocessing parallelization

Finally I split it in two as the logic is slightly different: for OpenMP the environment variable OMP_NUM_THREADS is considered while for multiprocessing it is not. I also thought it could be confusing to use omp.pyx to define the logic of parallelization using multiprocessing package. To help with the review a bit here is an organized list of the main changed files.

Copy link
Member

@skoudoro skoudoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good to go!

I will wait until Friday evening to see if there is any additional comment and then go ahead and merge it.

Copy link
Contributor

@jhlegarreta jhlegarreta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hard work and for having persevered @drombas.

Looks very clean.

Thanks for the spreadsheet. Very helpful. On a related note, automatically knowing which methods are multi-threaded can be helpful to developers and users at some point. But that is a separate endeavor.

Not sure if I followed the removal in here
https://github.com/dipy/dipy/pull/2352/files#diff-5fcc99cde72e6ea9a640d32a91756c178d916d26ae479e0da4c766a61c45dd4cL226

But if it's OK, then dismiss the observation.

I am missing dedicated test methods for the accepted values in the determine_num_processes and determine_num_threads methods.

A minor in-line comment.

The last two comments can be addressed at a latter time if it is preferred to have this merged as soon as possible.

doc/interfaces/gibbs_unringing_flow.rst Show resolved Hide resolved
Copy link
Contributor

@jhlegarreta jhlegarreta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 to the hard work @drombas.

@drombas
Copy link
Contributor Author

drombas commented Apr 23, 2021

Thanks for the quick feedback!

Not sure if I followed the removal in here
https://github.com/dipy/dipy/pull/2352/files#diff-5fcc99cde72e6ea9a640d32a91756c178d916d26ae479e0da4c766a61c45dd4cL226

After the changes that part is not reached. It is true, though, that there is a catch of NotImplementedError exception that I did not implement (not sure how critical it is as it is the only place I saw that check).

I added it just in case.

@skoudoro skoudoro merged commit a3d0fed into dipy:master Apr 25, 2021
@skoudoro
Copy link
Member

thank you @drombas! merging

@jhlegarreta
Copy link
Contributor

I'd say that the exception added in 1e439ce should get tested. Thanks.

@drombas
Copy link
Contributor Author

drombas commented Apr 26, 2021

That exception is raised when the number of cores is undetermined and TBH I don't know how we could test it.

@jhlegarreta
Copy link
Contributor

That exception is raised when the number of cores is undetermined and TBH I don't know how we could test it.

Can the exception be forced to be raised and the expected message or result be checked?

@drombas
Copy link
Contributor Author

drombas commented Apr 27, 2021

Can the exception be forced to be raised and the expected message or result be checked?

I imagine it should be possible if we knew exactly how the number of cores is determined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants