The
randomized version of the work stealing algorithm presented by Blumofe and Leiserson maintains several threads of execution and schedules these onto P processors. Each of the processors has a
double-ended queue (deque) of threads. Call the ends of the deque "top" and "bottom". Each processor that has a current thread to execute, executes the instructions in the thread one by one, until it encounters an instruction that causes one of four "special" behaviors: • A
spawn instruction causes a new thread to be created. The current thread is placed at the bottom of the deque, and the processor starts executing the new thread. • A
stalling instruction is one that temporarily halts execution of its thread. The processor pops a thread off the bottom of its deque and starts executing that thread. If its deque is empty, it starts work stealing, explained below. • An instruction may cause a thread to
die. The behavior in this case is the same as for an instruction that stalls. • An instruction may
enable another thread. The other thread is pushed onto the bottom of the deque, but the processor continues execution of its current thread. Initially, a computation consists of a single thread and is assigned to some processor, while the other processors start off idle. Any processor that becomes idle starts the actual process of work stealing, which means the following: • it picks another processor uniformly at random; • if the other processor's deque is non-empty, it pops the top-most thread off the deque and starts executing that; • else, repeat.
Child stealing vs. continuation stealing Note that, in the rule for
spawn, Blumofe and Leiserson suggest that the "parent" thread execute its new thread, as if performing a function call (in the C-like program , the function call to completes before the call to is performed). This is called "continuation stealing", because the
continuation of the function can be stolen while the spawned thread is executed, and is the scheduling algorithm used in
Cilk Plus. It is not the only way to implement work stealing; the alternative strategy is called "child stealing" and is easier to implement as a
library, without
compiler support. Child stealing is used by
Threading Building Blocks, Microsoft's Task Parallel Library and
OpenMP, although the latter gives the programmer control over which strategy is used. ==Efficiency==