DFS-based linear-time algorithms Several
algorithms based on
depth-first search compute strongly connected components in linear time. •
Kosaraju's algorithm uses two passes of depth-first search. The first, in the original graph, is used to choose the order in which the outer loop of the second depth-first search tests vertices for having been visited already and
recursively explores them if not. The second depth-first search is on the
transpose graph of the original graph, and each recursive exploration finds a single new strongly connected component. •
Tarjan's strongly connected components algorithm, published by
Robert Tarjan in 1972, performs a single pass of depth-first search. It maintains a
stack of vertices that have been explored by the search but not yet assigned to a component, and calculates "low numbers" of each vertex (an index number of the highest ancestor reachable in one step from a descendant of the vertex) which it uses to determine when a set of vertices should be popped off the stack into a new component. • The
path-based strong component algorithm uses a depth-first search, like Tarjan's algorithm, but with two stacks. One of the stacks is used to keep track of the vertices not yet assigned to components, while the other keeps track of the current path in the depth-first search tree. The first linear time version of this algorithm was published by
Edsger W. Dijkstra in 1976. Although Kosaraju's algorithm is conceptually simple, Tarjan's and the path-based algorithm require only one depth-first search rather than two.
Reachability-based algorithms Previous linear-time algorithms are based on depth-first search which is generally considered hard to parallelize. Fleischer et al. in 2000 proposed a
divide-and-conquer approach based on
reachability queries, and such algorithms are usually called reachability-based SCC algorithms. The idea of this approach is to pick a random pivot vertex and apply forward and backward reachability queries from this vertex. The two queries partition the vertex set into 4 subsets: vertices reached by both, either one, or none of the searches. One can show that a strongly connected component has to be contained in one of the subsets. The vertex subset reached by both searches forms a strongly connected component, and the algorithm then recurses on the other 3 subsets. The expected sequential running time of this algorithm is shown to be O(
n log
n), a factor of O(log
n) more than the classic algorithms. The parallelism comes from: (1) the reachability queries can be parallelized more easily (e.g. by a
breadth-first search (BFS), and it can be fast if the
diameter of the graph is small); and (2) the independence between the subtasks in the divide-and-conquer process. This algorithm performs well on real-world graphs, but does not have theoretical guarantee on the parallelism (consider if a graph has no edges, the algorithm requires O(
n) levels of recursions). Blelloch et al. in 2016 shows that if the reachability queries are applied in a random order, the cost bound of O(
n log
n) still holds. Furthermore, the queries then can be batched in a prefix-doubling manner (i.e. 1, 2, 4, 8 queries) and run simultaneously in one round. The overall
span of this algorithm is log2
n reachability queries, which is probably the optimal parallelism that can be achieved using the reachability-based approach.
Generating random strongly connected graphs Peter M. Maurer describes an algorithm for generating random strongly connected graphs, based on a modification of an algorithm for
strong connectivity augmentation, the problem of adding as few edges as possible to make a graph strongly connected. When used in conjunction with the Gilbert or Erdős-Rényi models with node relabelling, the algorithm is capable of generating any strongly connected graph on
n nodes, without restriction on the kinds of structures that can be generated. ==Applications==