The simple parallel
merge sort of
CLRS is a fork–join algorithm. mergesort(A, lo, hi):
if lo < hi:
// at least one element of input mid = ⌊lo + (hi - lo) / 2⌋
fork mergesort(A, lo, mid)
// process (potentially) in parallel with main task mergesort(A, mid, hi)
// main task handles second recursion join merge(A, lo, mid, hi) The first recursive call is "forked off", meaning that its execution may run in parallel (in a separate thread) with the following part of the function, up to the that causes all threads to synchronize. While the may look like a
barrier, it is different because the threads will continue to work after a barrier, while after a only one thread continues. The second recursive call is not a fork in the pseudocode above; this is intentional, as forking tasks may come at an expense. If both recursive calls were set up as subtasks, the main task would not have any additional work to perform before being blocked at the . ==Implementations==