Terminology: n_jobs vs num_workers vs ncpu etc. #4876

emmanuelle · 2020-07-31T11:02:19Z

Since we aim at accelerating some of our functions, one way to do this is to use parallel computing. We already have one function (restoration.cycle_spin) which is using dask.delayed and several threads, there is also the attempt to parallelize segmentation.slic in #3120 which I plan to revive, and probably other functions will have in the future the possibility to use one or more jobs/workers (threads or processes). Therefore, I'm opening this issue so that we can decide what is the best terminology for this parameter. Should it be

num_workers (as in cycle_spin, and as in dask)
n_jobs (as in scikit-learn and in joblib)
something else? (probably not, it's unfortunate enough that joblib and dask have a different convention)
I think both are fine, the choice should be more about consistency with the rest of the ecosystem: do we want to be more consistent with our backend, or with big brother scikit?

The text was updated successfully, but these errors were encountered:

grlee77 · 2020-07-31T19:36:35Z

I hate to say it, but there is a third contender: workers

SciPy tends to use just workers rather than num_workers (e.g. in scipy.fft functions, differential_evolution and quad_vec). For FFTs, it also provides a context manager that can be used to control the default number of workers, which I think is nice. See scipy.fft.set_workers. Adding workers to more functions is in their roadmap.

grlee77 · 2020-07-31T19:38:42Z

I think I actually like just workers the best, but don't have a problem with any of the three as long as we use it consistently within scikit-image!

emmanuelle · 2020-08-01T15:56:47Z

Right, workers also makes sense of course. And it's more a question of workers rather than jobs, so for the sake of clarity I'd prefer either workers or num_workers (not proposing n_workers which would be a really good name, but we don't want to multiply the number of names in the ecosystem).

So, who is for workers and who is for num_workers?

sciunto · 2020-08-12T08:39:04Z

Right, workers also makes sense of course. And it's more a question of workers rather than jobs,

Jobbers ? :)

sciunto · 2020-08-12T08:51:41Z

To be more serious, workers looks sufficient, but num_workers is more explicit, I immediately understand that I'm supposed to pass an integer.

About n_* and num_*, we have both in the lib (n_dim, n_bins, n_inliers, n_tiles) and (num_trials, num_shapes, num_peaks, num_channels, ). A preference must be added to #2616

alexdesiqueira · 2020-08-13T22:25:03Z

I'd vote to maintain the consistency throughout the packages: maybe workers is the way to go?

stefanv · 2024-01-19T00:44:39Z

To be consistent with the ecosystem, this requires some careful deliberation.

@thomasjpfan wrote up a good overview of the state of the ecosystem. I see he recommends workers as well.

scikit-learn is currently on n_jobs, and scipy on workers.

lagru · 2024-01-22T12:40:23Z

Thanks for the link. To quote from that page

We use SciPy’s workers parameter because it is more consistent in controlling the number of cores used. workers denotes any form of parallelism such as: multi-threading, multiprocessing, OpenMP threads, or pthreads.

This argument works just as well in favor of num_workers if you replace "SciPy" with "dask". num_workers seems the more expressive name to me (we are not passing actual workers). But workers is fine with me as well and SciPy is probably closer to us than dask. I really don't care that much about which solution we settle on as long we don't stall this again for a few years. 🤞

lagru · 2024-02-26T12:37:15Z

How can we move this along with regards to #7302? I'm happy to involve the ecosystem but how do I do that? According to New SPEC Proposals this might be a good fit for a SPEC. I'll make a post in https://discuss.scientific-python.org/c/specs/ideas/9 if there's no objection.

stefanv · 2024-02-28T22:02:44Z

This does seem like exactly the kind of thing we need to agree on across projects, so +1.

lagru · 2024-03-04T18:31:36Z

Posted this as a SPEC idea in Terminology for parameters controlling parallel computation.

emmanuelle added the 💬 Discussion label Jul 31, 2020

emmanuelle mentioned this issue Aug 17, 2020

2020's calendar of community management #4486

Closed

grlee77 added the 📜 type: API Involves API change(s) label Apr 7, 2021

grlee77 mentioned this issue Apr 7, 2021

RF: configure num_threads==-1 as the value to use all cores dipy/dipy#2352

Merged

grlee77 added this to To Do in skimage2 API Apr 7, 2021

grlee77 mentioned this issue Apr 12, 2021

2021's calendar of community management #5169

Closed

grlee77 mentioned this issue Jun 22, 2021

Meta-issue: pending API changes for 2.0 #5439

Closed

scikit-image locked and limited conversation to collaborators Oct 18, 2021

rfezzani closed this as completed Oct 18, 2021

grlee77 reopened this Feb 18, 2022

grlee77 mentioned this issue Feb 20, 2022

Deactivate Github Discussions #6167

Closed

scikit-image unlocked this conversation Feb 21, 2022

mkcor mentioned this issue Feb 28, 2022

2022's calendar of community management #6165

Closed

lagru linked a pull request Jan 19, 2024 that will close this issue

Use num_workers instead of alternate parameter names #7302

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terminology: n_jobs vs num_workers vs ncpu etc. #4876

Terminology: n_jobs vs num_workers vs ncpu etc. #4876

emmanuelle commented Jul 31, 2020

grlee77 commented Jul 31, 2020 •

edited

grlee77 commented Jul 31, 2020

emmanuelle commented Aug 1, 2020

sciunto commented Aug 12, 2020

sciunto commented Aug 12, 2020

alexdesiqueira commented Aug 13, 2020

stefanv commented Jan 19, 2024

lagru commented Jan 22, 2024

lagru commented Feb 26, 2024

stefanv commented Feb 28, 2024

lagru commented Mar 4, 2024

Terminology: n_jobs vs num_workers vs ncpu etc. #4876

Terminology: n_jobs vs num_workers vs ncpu etc. #4876

Comments

emmanuelle commented Jul 31, 2020

grlee77 commented Jul 31, 2020 • edited

grlee77 commented Jul 31, 2020

emmanuelle commented Aug 1, 2020

sciunto commented Aug 12, 2020

sciunto commented Aug 12, 2020

alexdesiqueira commented Aug 13, 2020

stefanv commented Jan 19, 2024

lagru commented Jan 22, 2024

lagru commented Feb 26, 2024

stefanv commented Feb 28, 2024

lagru commented Mar 4, 2024

grlee77 commented Jul 31, 2020 •

edited