Leiden algorithm

The Leiden algorithm is a community detection algorithm developed by Traag et al ^[1] at Leiden University. It was developed as a modification of the Louvain method to address the issues with disconnected communities.

Graph Components[edit]

Before defining the Leiden algorithm, it will be helpful to define some of the components of a graph.

Vertices and Edges[edit]

A graph is comprised of vertices (nodes) and edges. Each edge is connected to two vertices, and each vertex may be connected to zero or more edges. Edges are typically represented by straight lines, while nodes are represented by circles or points. In set notation, let $V$ be the set of vertices, and $E$ be the set of edges:

${\begin{aligned}V&:=\{v_{1},v_{2},\dots ,v_{n}\}\\E&:=\{e_{ij},e_{ik},\dots ,e_{kl}\}\end{aligned}}$

where $e_{ij}$ is the directed edge from vertex $v_{i}$ to vertex $v_{j}$ . We can also write this as an ordered pair:

${\begin{aligned}e_{ij}&:=(v_{i},v_{j})\end{aligned}}$

Community[edit]

A community is a unique set of nodes:

${\begin{aligned}C_{i}&\subseteq V\\C_{i}&\bigcap C_{j}=\emptyset ~\forall ~i\neq j\end{aligned}}$

and the union of all communities must be the total set of vertices:

${\begin{aligned}V&=\bigcup _{i=1}C_{i}\end{aligned}}$

Partition[edit]

A partition is the set of all communities:

${\begin{aligned}{\mathcal {P}}&=\{C_{1},C_{2},\dots ,C_{n}\}\end{aligned}}$

Quality[edit]

Similar to modularity, the quality function is used to assess how well the communities have been allocated. The Leiden algorithm uses the Constant Potts Model (CPM):^[2]

${\begin{aligned}{\mathcal {H}}(G,{\mathcal {P}})&=\sum _{C\in {\mathcal {P}}}|E(C,C)|-\gamma {\binom {||C||}{2}}\end{aligned}}$

Algorithm[edit]

The Leiden algorithm starts from a singleton partition **(a)**. The algorithm moves individual nodes from one community to another to find a partition **(b)**, which is then refined **(c)**. An aggregate network **(d)** is created based on the refined partition, using the non-refined partition to create an initial partition for the aggregate network. For example, the red community in **(b)** is refined into two subcommunities in **(c)**, which after aggregation become two separate nodes in **(d)**, both belonging to the same community. The algorithm then moves individual nodes in the aggregate network **(e)**. In this case, refinement does not change the partition **(f)**. These steps are repeated until no further improvements can be made.

The Leiden algorithm is similar to that of the Louvain method, with some important modifications.

Step 1: First, each node in the network is assigned to its own community.

Step 2: Next, we decide which communities to move the nodes into and update the partition ${\mathcal {P}}$ .

queue = V(G) # create a queue from the nodes

while queue != empty:
  node = queue.next() # get the next node
  delta_H = 0
  for C in communities: # compute the change in quality for each community
    if delta_H(node, C) > delta_H:
      delta_H = delta_H(node, C)
      community = C
  if delta_H > 0:
    move node to community
    outside_nodes = { node_i | (node, node_i) are edges and node_i is not in community } # find the nodes which are connected to the node but not in the community
    queue.add(outside_nodes not already in queue)

Step 3: Assign each node in the graph to its own community in a new partition called ${\mathcal {P}}_{\text{refined}}$ .

Step 4: The goal of this step is to separate poorly-connected communities:

for C in communities of P:
  # find the nodes in the community which have lots of edges within the community
  well_connected_nodes = { node | node is in C, |E(node, C\node)| >= gamma ||node||(||C|| - ||node||) }
  for node in well_connected_nodes:
    if node is singleton under P_refined:
      well_connected_communities = { C_i | C_i is in P_refined, C_i is a subset of C, |E(C_i, C\C_i)| >= gamma*||C_i||(||C|| - ||C_i||)
      for C_i in well_connected_communities:
        compute probability P(C_i) # 0 if assignment of node to C_i decreases quality of P_refined, greater weights for greater quality increases
      assign node to C_i by sampling P(C_i) distribution

Step 5: Use the refined partition ${\mathcal {P}}_{\text{refined}}$ to aggregate the graph. Each community in ${\mathcal {P}}_{\text{refined}}$ becomes a node in the new graph $G_{\text{agg}}$ .

Example: Suppose that we have:

${\begin{aligned}V&=\{v_{1},v_{2},v_{3},v_{4},v_{5},v_{6},v_{7}\}\\C_{1}&=\{v_{1},v_{2},v_{3},v_{4}\}\\C_{2}&=\{v_{5},v_{6},v_{7}\}\\{\mathcal {P}}&=\{C_{1},C_{2}\}\\{\mathcal {P}}_{\text{refined}}&=\{C_{1a},C_{1b},C_{2}\}\end{aligned}}$

Then our new set of nodes will be:

${\begin{aligned}V_{agg}&=\{C_{1a}\mapsto w_{1a},C_{1b}\mapsto w_{1b},C_{2}\mapsto w_{2}\}\end{aligned}}$

Step 6: Update the partition ${\mathcal {P}}$ using the aggregated graph. We keep the communities from partition ${\mathcal {P}}$ , but the communities can be separated into multiple nodes from the refined partition ${\mathcal {P}}_{\text{refined}}$ :

${\begin{aligned}{\mathcal {P}}&=\{\{v~|~v\subseteq C,v\in V(G_{\text{agg}})\}~|~C\in {\mathcal {P}}\}\end{aligned}}$

Example: Suppose that $C$ is a poorly-connected community from the partition ${\mathcal {P}}$ :

${\begin{aligned}C&=\{v_{1},v_{2},v_{3},v_{4},v_{5}\}\\{\mathcal {P}}&=\{C\}\end{aligned}}$

Then suppose during the refinement step, it was separated into two communities, $C_{1}$ and $C_{2}$ :

${\begin{aligned}C_{1}&=\{v_{1},v_{2},v_{3}\}\\C_{2}&=\{v_{4},v_{5}\}\\{\mathcal {P}}_{\text{refined}}&=\{C_{1},C_{2}\}\end{aligned}}$

When we aggregate the graph, the new nodes will be:

${\begin{aligned}V(G_{\text{agg}})&=\{C_{1},C_{2}\}\end{aligned}}$

but we will keep the old partition:

${\begin{aligned}{\mathcal {P}}&=\{\{C_{1},C_{2}\}\}\end{aligned}}$

Step 7: Repeat Steps 2 - 6 until each community consists of only one node.

References[edit]

^ Traag, Vincent A; Waltman, Ludo; van Eck, Nees Jan (26 March 2019). "From Louvain to Leiden: guaranteeing well-connected communities". Scientific Reports. 9 (1): 5233. arXiv:1810.08473. Bibcode:2019NatSR...9.5233T. doi:10.1038/s41598-019-41695-z. PMC 6435756. PMID 30914743.
^ Traag, Vincent A; Van Dooren, Paul; Nesterov, Yurii (29 July 2011). "Narrow scope for resolution-limit-free community detection". Physical Review E. 84 (1): 016114. arXiv:1104.3083. Bibcode:2011PhRvE..84a6114T. doi:10.1103/PhysRevE.84.016114. PMID 21867264.

[Leiden-1] Traag, Vincent A; Waltman, Ludo; van Eck, Nees Jan (26 March 2019). "From Louvain to Leiden: guaranteeing well-connected communities". Scientific Reports. 9 (1): 5233. arXiv:1810.08473. Bibcode:2019NatSR...9.5233T. doi:10.1038/s41598-019-41695-z. PMC 6435756. PMID 30914743.

[CPM-2] Traag, Vincent A; Van Dooren, Paul; Nesterov, Yurii (29 July 2011). "Narrow scope for resolution-limit-free community detection". Physical Review E. 84 (1): 016114. arXiv:1104.3083. Bibcode:2011PhRvE..84a6114T. doi:10.1103/PhysRevE.84.016114. PMID 21867264.

[1]

[2]