The usual algorithms for topological sorting have running time linear in the number of nodes plus the number of edges, asymptotically, O(\left|{V}\right| + \left|{E}\right|).
Kahn's algorithm One of these algorithms, first described by , works by choosing vertices in the same order as the eventual topological sort. First, find a list of "start nodes" that have no incoming edges and insert them into a set S; at least one such node must exist in a non-empty (finite) acyclic graph. Then:
L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edge
while S is not empty
do remove a node
n from
S add
n to
L for each node
m with an edge
e from
n to
m do remove edge
e from the graph
if m has no other incoming edges
then insert
m into
S if graph has edges
then return error
(graph has at least one cycle) else return L (a topologically sorted order) If the graph is a
DAG, a solution will be contained in the list L (although the solution is not necessarily unique). Otherwise, the graph must have at least one cycle and therefore a topological sort is impossible. Reflecting the non-uniqueness of the resulting sort, the structure S can be simply a set or a queue or a stack. Depending on the order that nodes n are removed from set S, a different solution is created. A variation of Kahn's algorithm that breaks ties
lexicographically forms a key component of the
Coffman–Graham algorithm for parallel scheduling and
layered graph drawing.
Depth-first search An alternative algorithm for topological sorting is based on
depth-first search. The algorithm loops through each node of the graph, in an arbitrary order, initiating a depth-first search that terminates when it hits any node that has already been visited since the beginning of the topological sort or the node has no outgoing edges (i.e., a leaf node):
L ← Empty list that will contain the sorted nodes
while exists nodes without a permanent mark
do select an unmarked node
n visit(
n)
function visit(node
n)
if n has a permanent mark
then return if n has a temporary mark
then stop (graph has at least one cycle) mark
n with a temporary mark
for each node
m with an edge from
n to
m do visit(
m) mark
n with a permanent mark add
n to head of
L Each node
n gets
prepended to the output list L only after considering all other nodes that depend on
n (all descendants of
n in the graph). Specifically, when the algorithm adds node
n, we are guaranteed that all nodes that depend on
n are already in the output list L: they were added to L either by the recursive call to visit() that ended before the call to visit
n, or by a call to visit() that started even before the call to visit
n. Since each edge and node is visited once, the algorithm runs in linear time. This depth-first-search-based algorithm is the one described by ; it seems to have been first described in print by Tarjan in 1976.
Parallel algorithms On a
parallel random-access machine, a topological ordering can be constructed in
O((log
n)2) time using a polynomial number of processors, putting the problem into the complexity class
NC2. One method for doing this is to repeatedly square the
adjacency matrix of the given graph, logarithmically many times, using
min-plus matrix multiplication with maximization in place of minimization. The resulting matrix describes the
longest path distances in the graph. Sorting the vertices by the lengths of their longest incoming paths produces a topological ordering. An algorithm for parallel topological sorting on
distributed memory machines parallelizes the algorithm of Kahn for a
DAG G = (V, E). On a high level, the algorithm of Kahn repeatedly removes the vertices of indegree 0 and adds them to the topological sorting in the order in which they were removed. Since the outgoing edges of the removed vertices are also removed, there will be a new set of vertices of indegree 0, where the procedure is repeated until no vertices are left. This algorithm performs D+1 iterations, where is the longest path in . Each iteration can be parallelized, which is the idea of the following algorithm. In the following, it is assumed that the graph partition is stored on processing elements (PE), which are labeled 0, \dots, p-1. Each PE initializes a set of local vertices Q_i^1 with
indegree 0, where the upper index represents the current iteration. Since all vertices in the local sets Q_0^1, \dots, Q_{p-1}^1 have indegree 0, i.e., they are not adjacent, they can be given in an arbitrary order for a valid topological sorting. To assign a global index to each vertex, a
prefix sum is calculated over the sizes of Q_0^1, \dots, Q_{p-1}^1. So, each step, there are \sum_{i=0}^{p-1} |Q_i| vertices added to the topological sorting. In the first step, PE assigns the indices \sum_{i=0}^{j-1} |Q_i^1|, \dots, \left(\sum_{i=0}^{j} |Q_i^1|\right) - 1 to the local vertices in Q_j^1. These vertices in Q_j^1 are removed, together with their corresponding outgoing edges. For each outgoing edge (u, v) with endpoint in another PE l, j \neq l, the message (u, v) is posted to PE . After all vertices in Q_j^1 are removed, the posted messages are sent to their corresponding PE. Each message (u, v) received updates the indegree of the local vertex . If the indegree drops to zero, is added to Q_j^2. Then the next iteration starts. In step , PE assigns the indices a_{k-1} + \sum_{i=0}^{j-1} |Q_i^k|, \dots, a_{k-1} + \left(\sum_{i=0}^{j} |Q_i^k|\right) - 1, where a_{k-1} is the total number of processed vertices after step . This procedure repeats until there are no vertices left to process, hence \sum_{i=0}^{p-1} |Q_i^{D+1}| = 0. Below is a high level,
single program, multiple data pseudo-code overview of this algorithm. Note that the
prefix sum for the local offsets a_{k-1} + \sum_{i=0}^{j-1} |Q_i^k|, \dots, a_{k-1} + \left(\sum_{i=0}^{j} |Q_i^k|\right) - 1 can be efficiently calculated in parallel.
p processing elements with IDs from 0 to
p-1
Input: G = (V, E) DAG, distributed to PEs, PE index j = 0, ..., p - 1
Output: topological sorting of G
function traverseDAGDistributed δ incoming degree of local vertices
V // All vertices with indegree 0 nrOfVerticesProcessed = 0
do global build prefix sum over size of
Q // get offsets and total number of vertices in this step offset = nrOfVerticesProcessed + sum(Qi, i = 0 to j - 1) //
j is the processor index
foreach u
in Q localOrder[u] = index++;
foreach (u,v) in E
do post message (
u, v) to PE owning vertex
v nrOfVerticesProcessed += sum(|Qi|, i = 0 to p - 1) deliver all messages to neighbors of vertices in Q receive messages for local vertices V remove all vertices in Q
foreach message (
u, v) received:
if --δ[v] = 0 add
v to
Q while global size of
Q > 0
return localOrder The communication cost depends heavily on the given graph partition. As for runtime, on a
CRCW-PRAM model that allows fetch-and-decrement in constant time, this algorithm runs in \mathcal{O} \left(\frac{m + n}{p} + D (\Delta + \log n)\right), where is again the longest path in and the maximum degree. ==Application to shortest path finding==