Cluster-Based Hierarchies

Updated 6 December 2025

Cluster-based hierarchies are recursive structures that organize data into nested clusters, enabling analysis across multiple scales.
They employ both agglomerative and divisive algorithms, leveraging objective functions like Dasgupta’s cost to measure clustering quality.
Recent advances incorporate scalable methods, hyperbolic embedding techniques, and domain-specific constraints to enhance clustering accuracy and efficiency.

Cluster-based hierarchies, also known as hierarchical clustering structures or dendrograms, provide a recursive partitioning of data into nested clusters at multiple levels of granularity. Unlike flat clustering, which yields a single partition, cluster-based hierarchies reveal data organization at all scales, supporting exploratory analysis, taxonomy induction, and a variety of downstream tasks where multi-resolution structure is critical. This article reviews the fundamental principles, algorithmic frameworks, objective functions, and recent advances in the construction, analysis, and application of cluster-based hierarchies.

1. Formal Foundations and Objective Functions

A cluster-based hierarchy typically encodes a sequence of nested partitions (clusterings), representable as a rooted tree (dendrogram) whose leaves correspond to atomic data items (e.g., points, documents), and internal nodes represent clusters at increasing levels of abstraction. The fundamental combinatorial structure is a hierarchy $T$ over a ground set $V = \{1, \ldots, n\}$ , where each internal node corresponds to the set of leaves under it.

Traditionally, hierarchical clustering is agglomerative (bottom-up, merging clusters) or divisive (top-down, splitting clusters), but both target the construction of a laminar family—a collection of subsets closed under intersections, with every pair of clusters either disjoint, nested, or equal. This guarantees the tree property of the hierarchy. Each node thus defines a cluster encompassing all descendant leaves.

Prominent objective functions include:

Dasgupta’s Cost (for similarity data):

$\mathcal{F}(T) = \sum_{i < j} s_{ij} \cdot |\mathrm{leaves}(\mathrm{lca}_T(i, j))|$

where $s_{ij}$ is the pairwise similarity, and $\mathrm{lca}_T(i,j)$ the least common ancestor of $i, j$ (Chami et al., 2020, Chatziafratis et al., 2018).

Revenue/Value (for general similarity/dissimilarity):

$\mathrm{Revenue}(T) = \sum_{i < j} w_{ij} \cdot |\mathrm{nonleaves}(T[\mathrm{lca}_T(i, j)])|$

$\mathrm{Value}(T) = \sum_{i < j} w_{ij} \cdot |\mathrm{leaves}(T[\mathrm{lca}_T(i, j)])|$

with specific recovery and approximation guarantees (Hajiaghayi et al., 2021).

These objectives encode the principle that similar points should be grouped together at lower levels of the hierarchy (small clusters), penalizing early splits of strongly related data.

2. Classical and Advanced Algorithmic Frameworks

Classical methods for constructing cluster-based hierarchies include:

Agglomerative Linkage Algorithms: Single-link, complete-link, average-link, and Ward's linkage, where at each step the closest pair of clusters (under a chosen linkage metric) is merged—yielding a binary tree (Schubert et al., 2023).
Divisive and Top-Down Methods: Recursive partitioning using graph cuts (e.g., sparsest-cut or balanced-cut) to split data, allowing incorporation of structural constraints and prior knowledge (Chatziafratis et al., 2018).

Key innovations and improvements include:

Matching Affinity Clustering: Provides polylogarithmic round, scalable solutions in the MPC model, ensures balanced merges (i.e., similar-size clusters at each level), and offers proven constant-factor approximations for Dasgupta-style quality functions (Hajiaghayi et al., 2021).
Ultrametric and LCA-Tree Generalization: Any relaxed ultrametric can be represented by a lowest-common-ancestor tree, and any center-based clustering objective (e.g. $k$ -means, $k$ -median, $k$ -center) solved optimally and hierarchically over such a tree in $O(\mathrm{Sort}(n))$ time (Draganov et al., 19 Feb 2025).
Continuous Relaxation in Hyperbolic Space (HypHC): Maps discrete hierarchical structures to continuous hyperbolic embeddings, enabling gradient-based optimization of hierarchical cost functions and an efficient decoding scheme to binary trees, achieving $(1+\epsilon)$ -approximations to the optimal dendrogram (Chami et al., 2020).
Order-Preserving and Constraint-Based Methods: Partial dendrograms and ultrametric fitting under partial order constraints provide support for hierarchically clustering ordered data types and DAGs, maximizing both fit to the data and preservation of intrinsic data orders (Bakkelund, 2020).

3. Extensions for Scalability and Resource Constraints

Standard hierarchical clustering has $O(n^3)$ runtime and $O(n^2)$ memory complexity, limiting its practicality on very large datasets. Several approaches address these limitations:

Data Aggregation (BETULA, BIRCH): Input points are aggregated into "cluster features" storing sufficient statistics. The cluster hierarchy is then built over compressed representations, dramatically reducing time and space requirements with negligible quality loss ( $<5\%$ RMSD in centroid positions for orders-of-magnitude speedup) (Schubert et al., 2023).
Efficient Hierarchies for Density-Based Clustering: For HDBSCAN*, the use of nested relative-neighborhood graphs enables construction of clustering hierarchies across a range of density parameters at virtually no additional computational cost ( $R$ hierarchies for $\sim 2\times$ the time of one run) (Neto et al., 2017).
Dynamic, Balanced, Isotropic Trees: Spherical B-trees maintain balanced, online-updatable hierarchies of clusters based on isotropic balls with O(1) incremental statistics updates and support for both brute-force and greedy splitting, supporting millions of points in $\mathbb{R}^d$ for large $d$ (Sadikov et al., 2016).

4. Incorporating Constraints, Prior Knowledge, and Domain Structure

Many applications require the integration of side-information:

Structural Constraints: Triplet constraints (" $a, b$ must be together before $c$ ") and subtree constraints are incorporated via supergraph construction and recursive-cut algorithms. Regularization can be introduced to balance constraint satisfaction with data-fit, yielding $O(k\,\alpha_n)$ -approximation to the best tree respecting $k$ constraints when cut-approximations are within $\alpha_n$ (Chatziafratis et al., 2018).
Prior Knowledge Integration through Ultrametric Regularization: External ontologies (e.g., product taxonomies) can be encoded as ultrametric distances combined with empirical ones ( $d' = (1-\alpha)d_{\mathrm{data}} + \alpha d_{\mathrm{prior}}$ ), with linkage clustering (especially single-link) guaranteed to recover both prior-driven and data-driven trees and their convex combinations (Ma et al., 2018).
Path-Based Hierarchies in Knowledge Graphs: Hierarchy is induced first on tags by co-occurrence-based generality/similarity scoring, and subjects are then assigned to the most coherent path in the tag tree (Pietrasik et al., 2021).

5. Non-Binary, Validity-Enforced, and Model-Based Hierarchies

Recent theoretical advances highlight and address limitations in classical binary, always-hierarchical constructions:

Validity-Pruned Linkage Clustering: A “valid cluster” is defined as a set whose within-members are strictly more similar to each other than to any outsider. A prune-after-linkage approach removes internal nodes violating validity, resulting in non-binary, maximal, and, when appropriate, star-structured hierarchies. Single, complete, and average linkage satisfy the necessary and sufficient conditions for this procedure; Ward's linkage can fail to do so (Dreveton et al., 22 Nov 2025).
Model-Based (Bayesian) Hierarchical Clustering: Bayesian marginal likelihood provides an explicit regularized criterion for merging clusters and sharing parameters across tree nodes, enabling dynamic feature-partitioning (useful vs. globally shared/noise) at each node. The structure—including depth and feature assignment—is determined automatically for optimal model complexity, as demonstrated in document clustering (Vaithyanathan et al., 2013).

6. Specialized and Application-Specific Hierarchy Frameworks

Cluster-based hierarchies have been extended in various applied and domain-specific contexts:

Hierarchies in Directed or Ordered Data: Exact order-preserving HAC produces partial dendrograms obeying all original strict partial orders, with optimization formulated as $L^p$ ultrametric fitting under order constraints (Bakkelund, 2020).
Community Hierarchies in Graphs: Hierarchies of predominantly connected communities (e.g., source communities (SC), web communities (WC), extreme sets (ES)) are constructed using parametric cut methods with full completeness (no missed levels) and can be specialized locally for given communities with linear time complexity post-preprocessing (Hamann et al., 2013).
Synthetic Hierarchy Generation and Benchmarking: The tree-structured stick-breaking (TSSB) process generates synthetic, ground-truth-labeled hierarchical data sets, permitting comprehensive benchmarking of clustering methods under controlled variation of depth, width, shrinkage, and overlap (Olech et al., 2016).

7. Future Directions and Theoretical Innovations

Recent work suggests multiple lines for further development:

Unified Optimization and Evaluation: Continuous relaxations of hierarchical clustering objectives, notably via hyperbolic geometric embeddings, offer both performance and flexibility for end-to-end learning (Chami et al., 2020).
Optimality and Partition Extraction: Given any ultrametric hierarchy, all center-based clustering objectives can be solved exactly and simultaneously at all $k$ —and the solutions themselves form new hierarchies, greatly expanding the interpretability and analytical toolkit for cluster analysis (Draganov et al., 19 Feb 2025).
Human-in-the-loop and Annotation: Interactive frameworks (e.g., CHAMP) for annotation and consolidation of cluster-based hierarchies in NLP and knowledge representation guarantee transitivity and enable rapid adjudication in multi-annotator settings (Cattan et al., 2023).
Handling Noise, Stability, and Diversity: Pruning, regularization, and multi-scale analysis of hierarchies are actively studied to increase robustness to noise, improve true structure recovery, and provide confidence quantification over possible trees (Dreveton et al., 22 Nov 2025, Chatziafratis et al., 2018).

These advances collectively widen the scope of cluster-based hierarchies, making them central both as an analytical lens and as an interface supporting principled exploration, modeling, and interaction with complex data at multiple resolutions.