Leiden Algorithm: Efficient Community Detection

Updated 25 May 2026

Leiden algorithm is a graph‐based community detection method that optimizes modularity and guarantees well-connected communities.
It uses a three-phase process—local moving, partition refinement, and aggregation—to enhance partition quality and scalability on large networks.
Dynamic extensions like DF-Leiden and HIT-Leiden enable efficient updates, supporting real-time analytics in various domains.

The Leiden algorithm is a graph-based community detection method that optimizes modularity (or a related quality function) by iteratively refining a network partition. Designed to address the limitations of its predecessor, the Louvain algorithm, Leiden guarantees the output of well-connected communities, achieves subset-optimality under continued refinement, and demonstrates superior empirical scalability and partition quality across real-world and synthetic networks. The method is applicable at massive graph scales and serves as a foundation for static and dynamic community detection in a range of domains, including biological data analysis, anomaly detection, and retrieval-augmented LLMs (Traag et al., 2018, Sahu, 2023, Lin et al., 13 Jan 2026).

1. Foundations and Motivations

The modularity $Q$ of a network partition under the Newman–Girvan criterion is defined as

$Q = \frac{1}{2m} \sum_{(i,j)\in E} \left[ w_{ij} - \frac{K_i K_j}{2m} \right] \delta(C_i, C_j) = \sum_{c} \left[ \frac{\sigma_c}{2m} - \left( \frac{\Sigma_c}{2m} \right)^2 \right]$

where $w_{ij}$ is the weight of edge $(i, j)$ , $K_i = \sum_{j} w_{ij}$ the degree of node $i$ , $\sigma_c$ the intra-community edge weight, and $\Sigma_c$ the total degree in community $c$ (Sahu, 2023, Sahu, 2024).

The traditional Louvain algorithm achieves high modularity via alternating local moving and aggregation phases but may yield communities that are internally disconnected or only poorly connected—a phenomenon established to affect up to 25% of communities in empirical networks (Traag et al., 2018). The Leiden algorithm was developed to eliminate this defect, guaranteeing community connectivity and refining the optimality of partitions through a principled hierarchical expansion-refinement paradigm.

2. Algorithmic Structure

Each iteration of Leiden consists of three main phases:

Local Moving Phase: Each vertex is examined (in random or asynchronous order), and considered for movement to the neighboring community that achieves the largest positive increase in $Q$ :

$Q = \frac{1}{2m} \sum_{(i,j)\in E} \left[ w_{ij} - \frac{K_i K_j}{2m} \right] \delta(C_i, C_j) = \sum_{c} \left[ \frac{\sigma_c}{2m} - \left( \frac{\Sigma_c}{2m} \right)^2 \right]$ 0

Vertices are moved greedily until no positive gain exceeds the set tolerance $Q = \frac{1}{2m} \sum_{(i,j)\in E} \left[ w_{ij} - \frac{K_i K_j}{2m} \right] \delta(C_i, C_j) = \sum_{c} \left[ \frac{\sigma_c}{2m} - \left( \frac{\Sigma_c}{2m} \right)^2 \right]$ 1 (Sahu, 2023).

Partition Refinement Phase: Inside each community from the local-move phase, vertices begin as singletons. Only isolated vertices are eligible for movement, and only to sub-communities that preserve original community boundaries. This step ensures all resulting substructures are connected, avoiding the disconnected community pathology inherent to Louvain (Traag et al., 2018).
Aggregation Phase: Each refined community is contracted to a super-vertex. The next level's graph is reconstructed with appropriate super-edge weights, and the three-phase process repeats on the coarsened graph. Iteration continues until no further improvement is possible (Traag et al., 2018, Sahu, 2023).

A high-level pseudocode can be found in Traag et al. (Traag et al., 2018), with implementation guidance on move ordering, tolerance thresholds, and randomization in the refinement phase.

3. Theoretical Guarantees and Optimality

The Leiden algorithm provides formal guarantees unattainable by Louvain:

$Q = \frac{1}{2m} \sum_{(i,j)\in E} \left[ w_{ij} - \frac{K_i K_j}{2m} \right] \delta(C_i, C_j) = \sum_{c} \left[ \frac{\sigma_c}{2m} - \left( \frac{\Sigma_c}{2m} \right)^2 \right]$ 2-separation and $Q = \frac{1}{2m} \sum_{(i,j)\in E} \left[ w_{ij} - \frac{K_i K_j}{2m} \right] \delta(C_i, C_j) = \sum_{c} \left[ \frac{\sigma_c}{2m} - \left( \frac{\Sigma_c}{2m} \right)^2 \right]$ 3-connectivity: No two communities exist such that the modularity of merging them is positive, and every Leiden community admits a connected merge tree, recursively enforcing internal connectivity.
Node-optimality: At convergence, no single-node move can further increase $Q = \frac{1}{2m} \sum_{(i,j)\in E} \left[ w_{ij} - \frac{K_i K_j}{2m} \right] \delta(C_i, C_j) = \sum_{c} \left[ \frac{\sigma_c}{2m} - \left( \frac{\Sigma_c}{2m} \right)^2 \right]$ 4.
Subpartition $Q = \frac{1}{2m} \sum_{(i,j)\in E} \left[ w_{ij} - \frac{K_i K_j}{2m} \right] \delta(C_i, C_j) = \sum_{c} \left[ \frac{\sigma_c}{2m} - \left( \frac{\Sigma_c}{2m} \right)^2 \right]$ 5-density: Every community can only be subdivided into parts if such a split would strictly decrease the quality function.
Subset-optimality: After sufficient passes, no subset of any community can be moved (to another or as a singleton) to improve modularity (Traag et al., 2018, Lin et al., 13 Jan 2026).

These connectivity and optimality claims are established inductively by the structure of the refinement phase and formalized by Traag et al. The guarantees extend to both modularity and CPM objectives.

4. High-Performance and Scalability Enhancements

Practical deployment on massive shared-memory systems has motivated the development of highly optimized Leiden variants. GVE-Leiden exemplifies state-of-the-art shared-memory parallelization, leveraging:

Preallocated CSR and ‘holey’ CSR layouts for fast, repeated graph transformations.
Per-thread linear hash tables for neighbor-community weight accumulation, avoiding contention.
Dynamic OpenMP task scheduling and asynchronous, lock-free local moving implemented with atomics.
Flag-based vertex pruning and adaptive tolerance thresholds accelerate convergence.
Greedy refinement (deterministic, maximal modularity gain) empirically outperforms the original randomized scheme in both runtime and solution quality, with negligible modularity loss ( $Q = \frac{1}{2m} \sum_{(i,j)\in E} \left[ w_{ij} - \frac{K_i K_j}{2m} \right] \delta(C_i, C_j) = \sum_{c} \left[ \frac{\sigma_c}{2m} - \left( \frac{\Sigma_c}{2m} \right)^2 \right]$ 60.5%) versus the original algorithm (Sahu, 2023).

On dual 16-core Intel Xeon Gold CPUs, GVE-Leiden achieves peak throughput of 403M edges/s on a 3.8B edge graph, with speedups of 436x (vs. original Leiden), 104x (igraph), 8.2x (NetworKit), and 3x (cuGraph on A100 GPU). Strong scaling shows a 1.6x speedup per doubling of threads up to 64-thread systems, with performance ultimately constrained by NUMA effects and intrinsically sequential phases (e.g., dendrogram traversal) (Sahu, 2023).

5. Dynamic Leiden for Evolving Graphs

Real-world graphs often evolve incrementally, motivating dynamic extensions to Leiden. Three principal multicore dynamic variants have been proposed:

Naive-dynamic (ND): Reinitializes Leiden after each batch of updates using the previous partition, but processes all vertices.
Delta-screening (DS): Restricts Leiden’s local-move phase to a superset of “affected” vertices, identified by change in edge incidence; unaltered for refinement and aggregation.
Dynamic Frontier (DF): Maintains a dynamically expanding front of affected vertices, where each move can propagate further locality updates (Sahu, 2024, Sahu, 2024).

All dynamic improvements prune only the costly local-move phase, not the refinement or aggregation. DF-Leiden achieves the highest speedups—up to 6.1x on synthetic batches and 1.38x on real temporal graphs—while preserving nearly identical modularity and full connectivity (Sahu, 2024). However, the upper bound on end-to-end speedup is set by the share of runtime pruned; in practical scenarios, this is limited, as refinement and aggregation remain dominant.

Recent advances have overcome the unboundedness of prior dynamic approaches. HIT-Leiden (Hierarchical Incremental Tree Leiden) achieves sublinear (in $Q = \frac{1}{2m} \sum_{(i,j)\in E} \left[ w_{ij} - \frac{K_i K_j}{2m} \right] \delta(C_i, C_j) = \sum_{c} \left[ \frac{\sigma_c}{2m} - \left( \frac{\Sigma_c}{2m} \right)^2 \right]$ 7) update times in the size of affected 2-hop neighborhoods under edge changes, via maintaining hierarchical supergraph structures and dynamic connected component indices. HIT-Leiden realizes up to $Q = \frac{1}{2m} \sum_{(i,j)\in E} \left[ w_{ij} - \frac{K_i K_j}{2m} \right] \delta(C_i, C_j) = \sum_{c} \left[ \frac{\sigma_c}{2m} - \left( \frac{\Sigma_c}{2m} \right)^2 \right]$ 8 empirical speedup over static recomputation, with negligible loss in both modularity and connectivity guarantees (Lin et al., 13 Jan 2026).

6. Quality Assessment, Implementation, and Applicability

Leiden consistently yields modularity within $Q = \frac{1}{2m} \sum_{(i,j)\in E} \left[ w_{ij} - \frac{K_i K_j}{2m} \right] \delta(C_i, C_j) = \sum_{c} \left[ \frac{\sigma_c}{2m} - \left( \frac{\Sigma_c}{2m} \right)^2 \right]$ 9 of the original sequential and reference implementations, and significantly exceeds the quality produced by large-scale alternatives (e.g., by $w_{ij}$ 0 over NetworKit in GVE-Leiden benchmarks) (Sahu, 2023). The methodology virtually eliminates the existence of internally disconnected communities (0% in GVE-Leiden, compared to $w_{ij}$ 14% in high-performance Louvain implementations). Leiden is directly applicable to massive static graphs (up to billions of edges) and is best deployed in systems with $w_{ij}$ 232 threads and large DRAM capacity ( $w_{ij}$ 3256GB) (Sahu, 2023).

Relevant software implementations are available in Java, Python, C++, and highly optimized multicore environments. Typical usage requires only a small number of passes (5–20 on most networks) to reach near–subset-optimality (Traag et al., 2018).

7. Limitations and Extensions

There are several well-defined limitations and directions for further development:

Above the shared-memory regime, scaling requires explicit distributed-memory or GPU-targeted implementations (e.g., Dask, UPC++, ParLeiden-D).
Some bottlenecks (serial data structures, dendrogram renumbering) limit perfect strong scaling.
The core method is built around modularity maximization but is extensible to CPM, significance, or stability objectives by formula replacement and parameter retuning; CPM-based variants show promise for evading the resolution limit of modularity (Traag et al., 2018, Sahu, 2023).
Fully dynamic approaches may require further innovation in refinement/aggregation for greater incremental speedup, as well as streaming updates with strongly bounded update time (Sahu, 2024, Lin et al., 13 Jan 2026).
Overlapping or multi-resolution community detection remains an open field for hierarchical and dynamic Leiden extensions (Lin et al., 13 Jan 2026).

Future enhancements are expected to include dynamic resolution tuning, streaming update regimes, and the extension to more general quality objectives and community structures. These developments are critical for emerging use cases in online graph analytics, real-time retrieval-augmented models, graph-based anomaly detection, and biosocial network mining (Sahu, 2024, Lin et al., 13 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (5)

From Louvain to Leiden: guaranteeing well-connected communities (2018)

GVE-Leiden: Fast Leiden Algorithm for Community Detection in Shared Memory Setting (2023)

Efficient Maintenance of Leiden Communities in Large Dynamic Graphs (2026)

Heuristic-based Dynamic Leiden Algorithm for Efficient Tracking of Communities on Evolving Graphs (2024)

A Starting Point for Dynamic Community Detection with Leiden Algorithm (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Leiden Algorithm.

Leiden Algorithm: Efficient Community Detection

1. Foundations and Motivations

2. Algorithmic Structure

3. Theoretical Guarantees and Optimality

4. High-Performance and Scalability Enhancements

5. Dynamic Leiden for Evolving Graphs

6. Quality Assessment, Implementation, and Applicability

7. Limitations and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Leiden Algorithm: Efficient Community Detection

1. Foundations and Motivations

2. Algorithmic Structure

3. Theoretical Guarantees and Optimality

4. High-Performance and Scalability Enhancements

5. Dynamic Leiden for Evolving Graphs

6. Quality Assessment, Implementation, and Applicability

7. Limitations and Extensions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research