Leiden algorithm

The Leiden algorithm is a community detection algorithm developed by Traag et al at Leiden University. It was developed as a modification of the Louvain method. Like the Louvain method, the Leiden algorithm attempts to optimize modularity in extracting communities from networks; however, it addresses key issues present in the Louvain method, namely poorly connected communities and the resolution limit of modularity.

Improvement over Louvain method

Broadly, the Leiden algorithm uses the same two primary phases as the Louvain algorithm: a local node moving step (though, the method by which nodes are considered in Leiden is more efficient) and a graph aggregation step. However, to address the issues with poorly-connected communities and the merging of smaller communities into larger communities (the resolution limit of modularity), the Leiden algorithm employs an intermediate refinement phase in which communities may be split to guarantee that all communities are well-connected. Consider, for example, the following graph: Three communities are present in this graph (each color represents a community). Additionally, the center "bridge" node (represented with an extra circle) is a member of the community represented by blue nodes. Now consider the result of a node-moving step which merges the communities denoted by red and green nodes into a single community (as the two communities are highly connected): Notably, the center "bridge" node is now a member of the larger red community after node moving occurs (due to the greedy nature of the local node moving algorithm). In the Louvain method, such a merging would be followed immediately by the graph aggregation phase. However, this causes a disconnection between two different sections of the community represented by blue nodes. In the Leiden algorithm, the graph is instead refined: The Leiden algorithm's refinement step ensures that the center "bridge" node is kept in the blue community to ensure that it remains intact and connected, despite the potential improvement in modularity from adding the center "bridge" node to the red community. ==Graph components==

Graph components

Before defining the Leiden algorithm, it will be helpful to define some of the components of a graph. Vertices and edges A graph is composed of vertices (nodes) and edges. Each edge is connected to two vertices, and each vertex may be connected to zero or more edges. Edges are typically represented by straight lines, while nodes are represented by circles or points. In set notation, let V be the set of vertices, and E be the set of edges: \begin{align} V &:= \{v_1, v_2, \dots, v_n \} \\ E &:= \{e_{ij}, e_{ik}, \dots, e_{kl} \} \end{align} where e_{ij} is the directed edge from vertex v_i to vertex v_j. We can also write this as an ordered pair: \begin{align} e_{ij} &:= (v_i, v_j) \end{align} Community A community is a unique set of nodes: \begin{align} C_i &\subseteq V \\ C_i &\bigcap C_j = \emptyset ~ \forall ~ i \neq j \end{align} and the union of all communities must be the total set of vertices: \begin{align} V &= \bigcup_{i=1} C_i \end{align} Partition A partition is the set of all communities: \begin{align} \mathcal{P} &= \{C_1, C_2, \dots, C_n \} \end{align} ==Partition quality==

Partition quality

How communities are partitioned is an integral part on the Leiden algorithm. How partitions are decided can depend on how their quality is measured. Additionally, many of these metrics contain parameters of their own that can change the outcome of their communities. Modularity Modularity is a highly used quality metric for assessing how well a set of communities partition a graph. The equation for this metric is defined for an adjacency matrix, A, as: Q = \frac{1}{2m}\sum_{ij}(A_{ij} - \frac{k_i k_j}{2m}) \delta(c_{i}, c_{j}) where: • A_{ij} represents the edge weight between nodes i and j; see Adjacency matrix; • k_i and k_j are the sum of the weights of the edges attached to nodes i and j, respectively; • m is the sum of all of the edge weights in the graph; • c_i and c_j are the communities to which the nodes i and j belong; and • \delta is Kronecker delta function: \begin{align} \delta(c_i, c_j) &= \begin{cases} 1 & \text{if } c_i \text{ and } c_j \text{ are the same community} \\ 0 & \text{otherwise} \end{cases} \end{align} Reichardt Bornholdt Potts Model (RB) One of the most well used metrics for the Leiden algorithm is the Reichardt Bornholdt Potts Model (RB). This model is used by default in most mainstream Leiden algorithm libraries under the name RBConfigurationVertexPartition. This model introduces a resolution parameter \gamma and is highly similar to the equation for modularity. This model is defined by the following quality function for an adjacency matrix, A, as: Q=\sum_{ij}(A_{ij}-\gamma\frac{k_ik_j}{2m})\delta(c_i, c_j) where: • \gamma represents a linear resolution parameter Constant Potts Model (CPM) Another metric similar to RB is the Constant Potts Model (CPM). This metric also relies on a resolution parameter \gamma The quality function is defined as: H=-\sum_{ij}(A_{ij}w_{ij}-\gamma)\delta(c_i, c_j) Understanding Potts Model resolution parameters/Resolution limit Typically Potts models such as RB or CPM include a resolution parameter in their calculation. These resolution parameters allow modularity adjacent methods to be modified to suit the requirements of the user applying the Leiden algorithm to account for small substructures at a certain granularity. The figure on the right illustrates why resolution can be a helpful parameter when using modularity based quality metrics. In the first graph, modularity only captures the large scale structures of the graph; however, in the second example, a more granular quality metric could potentially detect all substructures in a graph. ==Algorithm==

Algorithm

The Leiden algorithm starts with a graph of disorganized nodes (a) and sorts it by partitioning them to maximize modularity (the difference in quality between the generated partition and a hypothetical randomized partition of communities). The method it uses is similar to the Louvain algorithm, except that after moving each node it also considers that node's neighbors that are not already in the community it was placed in. This process results in our first partition (b), also referred to as \mathcal{P}. Then the algorithm refines this partition by first placing each node into its own individual community and then moving them from one community to another to maximize modularity. It does this iteratively until each node has been visited and moved, and each community has been refined - this creates partition (c), which is the initial partition of \mathcal{P}_{\text{refined}}. Then an aggregate network (d) is created by turning each community into a node. \mathcal{P}_{\text{refined}} is used as the basis for the aggregate network while \mathcal{P} is used to create its initial partition. Because we use the original partition \mathcal{P} in this step, we must retain it so that it can be used in future iterations. These steps together form the first iteration of the algorithm. In subsequent iterations, the nodes of the aggregate network (which each represent a community) are once again placed into their own individual communities and then sorted according to modularity to form a new \mathcal{P}_{\text{refined}}, forming (e) in the above graphic. In the case depicted by the graph, the nodes were already sorted optimally, so no change took place, resulting in partition (f). Then the nodes of partition (f) would once again be aggregated using the same method as before, with the original partition \mathcal{P} still being retained. This portion of the algorithm repeats until each aggregate node is in its own individual network; this means that no further improvements can be made. The Leiden algorithm consists of three main steps: local moving of nodes, refinement of the partition, and aggregation of the network based on the refined partition. All of the functions in the following steps are called using our main function Leiden, depicted below: The Fast Louvain method is borrowed by the authors of Leiden from "A Simple Acceleration Method for the Louvain Algorithm". function Leiden_community_detection(Graph G, Partition P) do P = fast_louvain_move_nodes(G, P) /* Call the function to move the nodes to communities.(more details in function below). */ done = (|P| == |V(G)|) /* If the number of partitions in P equals the number of nodes in G, then set done flag to True to end do-while loop, as this will mean that each node has been aggregated into its own community. */ if not done P_refined = get_p_refined(G, P) /* This is a crucial part of what separates Leiden from Louvain, as this refinement of the partition enforces that only nodes that are well connected within their community are considered to be moved out of the community. (more detail in function refine_partition_subset below). */ G = aggregate_graph(G, P_refined) /* Aggregates communities into single nodes for next iteration (details in function below). */ P = {{v | v ⊆ C, v ∈ V (G)} | C ∈ P} /* This line essentially takes nodes from the communities in P and breaks them down so that each node is treated as its own singleton community (community made up of one node). */ end if while not done return flattened(P) /* Return final partition where all nodes of G are listed in one community each. */ end function Step 1: Local Moving of Nodes First, we move the nodes from \mathcal{P} into neighboring communities to maximize modularity (the difference in quality between the generated partition and a hypothetical randomized partition of communities). In the above image, our initial collection of unsorted nodes is represented by the graph on the left, with each node's unique color representing that they do not belong to a community yet. The graph on the right is a representation of this step's result, the sorted graph \mathcal{P}; note how the nodes have all been moved into one of three communities, as represented by the nodes' colors (red, blue, and green). function fast_louvain_move_nodes(Graph G, Partition P) Q = queue(V(G)) /* Place all of the nodes of G into a queue to ensure that they are all visited. */ while Q not empty v = Q.pop_front() /* Select the first node from the queue to visit. */ C_prime = arg maxC∈P∪∅ ∆HP(v → C) /* Set C_prime to be the community in P or the empty set (no community) that provides the maximum increase in the Quality function H when node v is moved into that community. */ if ∆HP(v → C_prime) > 0 /* Only look at moving nodes that will result in a positive change in the quality function. */ v → C_prime /* Move node v to community C_prime */ N = {u | (u, v) ∈ E(G), u !∈ C_prime} /* Create a set N of nodes that are direct neighbors of v but are not in the community C_prime. */ Q.add(N - Q) /* Add all of the nodes from N to the queue, unless they are already in Q. */ end if return P /* Return the updated partition. */ end function Step 2: Refinement of the Partition Next, each node in the network is assigned to its own individual community and then moved them from one community to another to maximize modularity. This occurs iteratively until each node has been visited and moved, and is very similar to the creation of \mathcal{P} except that each community is refined after a node is moved. The result is our initial partition for \mathcal{P}_{\text{refined}}, as shown on the right. Note that we're also keeping track of the communities from \mathcal{P}, which are represented by the colored backgrounds behind the nodes. function get_p_refined(Graph G, Partition P) P_refined = get_singleton_partition(G) /* Assign each node in G to a singleton community (a community by itself). */ for C ∈ P P_refined = refine_partition_subset(G, P_refined, C) /* Refine partition for each of the communities in P_refined. */ end for return P_refined /* return newly refined partition. */ function refine_partition_subset(Graph G, Partition P, Subset S) R = {v | v ∈ S, E(v, S − v) ≥ γ * degree(v) * (degree(S) − degree(v))} /* For node v, which is a member of subset S, check if E(v, S-v) (the edges of v connected to other members of the community S, excluding v itself) are above a certain scaling factor. degree(v) is the degree of node v and degree(S) is the total degree of the nodes in the subset S. This statement essentially requires that if v is removed from the subset, the community will remain intact. */ for v ∈ R if v in singleton_community /* If node v is in a singleton community, meaning it is the only node. */ T = {C | C ∈ P, C ⊆ S, E(C, S − C) ≥ γ * degree(C) · (degree(S) − degree(C)} /* Create a set T of communities where E(C, S - C) (the edges between community C and subset S, excluding edges between community C and itself) is greater than the threshold. The threshold here is γ * degree(C) · (degree(S) − degree(C). */ Pr(C_prime = C) ~ exp(1/θ ∆HP(v → C) if ∆HP(v → C) ≥ 0 0 otherwise for C ∈ T /* If moving the node v to C_prime changes the quality function in the positive direction, set the probability that the community of v to exp(1/θ * ∆HP(v → C)) else set it to 0 for all of the communities in T. */ v → C_prime /* Move node v into a random C_prime community with a positive probability. */ end if end for return P /* return refined partition */ end function Step 3: Aggregation of the Network We then convert each community in \mathcal{P}_{\text{refined}} into a single node. Note how, as is depicted in the above image, the communities of \mathcal{P} are used to sort these aggregate nodes after their creation. function aggregate_graph(Graph G, Partition P) V = P /* Set communities of P as individual nodes of the graph. */ E = {(C, D) | (u, v) ∈ E(G), u ∈ C ∈ P, v ∈ D ∈ P} /* If u is a member of subset C of P, and v is a member subset D of P and u and v share an edge in E(G), then we add a connection between C and D in the new graph. */ return Graph(V, E) /* Return the new graph's nodes and edges. */ end function function get_singleton_partition(Graph G) return {{v} | v ∈ V (G)} /* This is the function where we assign each node in G to a singleton community (a community by itself). */ end function We repeat these steps until each community contains only one node, with each of these nodes representing an aggregate of nodes from the original network that are strongly connected with each other. ==Limitations==

Limitations

The Leiden algorithm does a great job of creating a quality partition which places nodes into distinct communities. However, Leiden creates a hard partition, meaning nodes can belong to only one community. In many networks such as social networks, nodes may belong to multiple communities and in this case other methods may be preferred. Leiden is more efficient than Louvain, but in the case of massive graphs may result in extended processing times. Recent advancements have boosted the speed using a "parallel multicore implementation of the Leiden algorithm". The Leiden algorithm does much to overcome the resolution limit problem. However, there is still the possibility that small substructures can be missed in certain cases. The selection of the gamma parameter is crucial to ensure that these structures are not missed, as it can vary significantly from one graph to the next. ==References==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com