For Comment Only: Lums/sc 25954/random scheduler #3978

lums658 · 2023-03-19T07:13:06Z

The random scheduler is intended to demonstrate the robustness of the task graph I/O - compute infrastructure, in preparation for full state management in the task graph.

This is probably the simplest possible finite thread pool scheduler, namely one that simply picks tasks at random and executes them. The nodes in the tasks are still stateful in that they have a "program counter" indicating the next step to be taken in their I/O - compute loop. However, the scheduler does not need to be aware of the node state. It simply picks a task and executes it (by invoking the resume method).

The tasks are kept in a thread-safe RandomizedQueue, which is similar to the BoundedBufferQueue except the items in the queue are kept in an underlying vector, which is shuffled on every pop() from the queue.

The random scheduler maintains all tasks in the single randomized queue. Resuming a task from any state is safe. In particular, if resume is invoked for a node that is waiting, it will simply continue to wait.

Preliminary performance results (from the sieve benchmark) show that the random scheduler is as effective as the full Duff's device scheduler (which puts tasks into running, waiting, or runnable queues) -- as long as the number of threads per core does not become too large. Otherwise, the overhead of locking the single job queue dominates execution time. There are likely a number of optimizations that can be made to reduce these overheads.

TYPE: FEATURE
DESC: A scheduler that simply picks tasks at random to run.

…with `make_edge` but if given a mimo node, the user has to specify in `make_edge` which port is being connected from or to.

Revert "Very small update [skip ci]" This reverts commit 828e501. Revert "Implement API for MIMO nodes in taskgraph. Ports are still connected with `make_edge` but if given a mimo node," This reverts commit 8441b8b. Revert "Revert "Very small update [skip ci]"" This reverts commit bd118d4. Revert "Revert "Implement API for MIMO nodes in taskgraph. Ports are still connected with `make_edge` but if given a mimo node,"" This reverts commit 9389acf. include ctad file in unit_segmented_nodes [skip ci] Fix CTAD deduction guides for function nodes [skip ci]

…with `make_edge` but if given a mimo node, the user has to specify in `make_edge` which port is being connected from or to.

Revert "Very small update [skip ci]" This reverts commit 828e501. Revert "Implement API for MIMO nodes in taskgraph. Ports are still connected with `make_edge` but if given a mimo node," This reverts commit 8441b8b. Revert "Revert "Very small update [skip ci]"" This reverts commit bd118d4. Revert "Revert "Implement API for MIMO nodes in taskgraph. Ports are still connected with `make_edge` but if given a mimo node,"" This reverts commit 9389acf. include ctad file in unit_segmented_nodes [skip ci] Fix CTAD deduction guides for function nodes [skip ci]

shortcut-integration · 2023-03-19T07:13:09Z

This pull request has been linked to Shortcut Story #25954: Implement random scheduler.

robertbindar

This was a nice read, thanks a lot. It's not exactly clear to me how the random scheduler is supposed to prove the robustness of the system, I'd love if you could explain this a bit in more details. How is graph supposed to work overall if its nodes are executed in random order? Do nodes yield execution if they have nothing on the input ports and no matter in which order you execute the tasks wrapping the nodes, the nodes will eventually end up running according to how data flows within the graph?

lums658 · 2023-04-10T17:27:40Z

On Apr 10, 2023, at 7:47 AM, Robert Bindar ***@***.***> wrote: @robertbindar approved this pull request. This was a nice read, thanks a lot. It's not exactly clear to me how the random scheduler is supposed to prove the robustness of the system, I'd love if you could explain this a bit in more details. How is graph supposed to work overall if its nodes are executed in random order? Do nodes yield execution if they have nothing on the input ports and no matter in which order you execute the tasks wrapping the nodes, the nodes will eventually end up running according to how data flows within the graph? That is exactly right. And the point is to show that the performance of doing it that way is as good as doing all the things of putting tasks to sleep and putting them in waiting queues, waking them up with notification signals etc. With random a node will execute if it has input and won’t execute if it doesn’t. By robustness I mean that the nodes and edges should behave properly and flow data through the graph even if they are selected to run at random rather than in topological sorted order, for example. I expect that after a graph (that is initially topologically sorted) is executed a few times the order of execution of nodes will become scrambled anyway. There is a bit of contention and other overhead (due to too much shuffling) at the moment on the random task queue but I expect we can reduce those without too much effort (use double-ended queue, be smarter about when to shuffle, &c.). Right now we re-shuffle on every pop. Cheers, Andrew Lumsdaine

lums658 added 21 commits March 15, 2023 14:24

Implement API for MIMO nodes in taskgraph. Ports are still connected …

860248e

…with `make_edge` but if given a mimo node, the user has to specify in `make_edge` which port is being connected from or to.

Prep for EC2 debug [skip ci]

7d23d8a

Add unit test files [skip ci]

23909e3

Implement API for MIMO nodes in taskgraph. Ports are still connected …

a1b4522

…with `make_edge` but if given a mimo node, the user has to specify in `make_edge` which port is being connected from or to.

Prep for EC2 debug [skip ci]

ef5431c

Add unit test files [skip ci]

73937ef

Race condition fixed [skip ci]

58b9187

Add integer-passing test to unit_random [skip ci]

7c9fd2b

Implement randomized queue and unit tests [skip ci]

e034579

clang-format [skip ci]

6c136ec

unit_random seems to pass [skip ci]

cd703ea

sieve running with random scheduler [skip ci]

4eb8d73

random scheduler competitive with duffs [skip ci]

d69268a

optimize random.h [skip ci]

45e7a29

Clean up random and randomized_queue [skip ci]

452d72d

clang-format [skip ci]

44e16c4

Merge branch 'lums/tmp/mimo_expts' into lums/sc-26634/basic-mimo

257f442

Merge branch 'lums/tmp/random_expts' into lums/sc-25954/random-scheduler

84f8363

Small fix [skip ci]

3681542

lums658 requested review from ihnorton, robertbindar, eric-hughes-tiledb and NikolaosPapailiou March 19, 2023 07:13

robertbindar approved these changes Apr 10, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For Comment Only: Lums/sc 25954/random scheduler #3978

For Comment Only: Lums/sc 25954/random scheduler #3978

lums658 commented Mar 19, 2023

shortcut-integration bot commented Mar 19, 2023

robertbindar left a comment

lums658 commented Apr 10, 2023 via email

For Comment Only: Lums/sc 25954/random scheduler #3978

Are you sure you want to change the base?

For Comment Only: Lums/sc 25954/random scheduler #3978

Conversation

lums658 commented Mar 19, 2023

shortcut-integration bot commented Mar 19, 2023

robertbindar left a comment

Choose a reason for hiding this comment

lums658 commented Apr 10, 2023 via email