Hierarchical Segmentation and Clustering
- Hierarchical segmentation and clustering are techniques that organize data into nested structures using dendrograms and ultrametrics to capture multi-level granularity.
- They employ both agglomerative and divisive strategies with spatial connectivity constraints to create meaningful partitions in image, 3D, and biological data.
- Recent deep learning integrations embed these methods in neural architectures, yielding scalable, interpretable segmentation across diverse application domains.
Hierarchical segmentation and clustering refer to a family of techniques that organize data points or spatial regions into nested groupings reflecting multiple levels of granularity. Unlike flat clustering, which produces a single partition, hierarchical methods generate tree-structured representations (dendrograms or segment hierarchies), facilitating multi-scale analysis, interpretable structure discovery, and principled selection of segmentation resolutions. These approaches are central to modern unsupervised learning, image analysis, 3D scene understanding, bioinformatics, and many domains where nested relationships matter.
1. Formal Foundations: Dendrograms, Ultrametrics, and Connectivity
At the core of hierarchical clustering is the dendrogram, a rooted tree where leaves correspond to atomic elements and internal nodes to merged clusters at various scales. Each level of the tree defines a partition, with finer levels containing smaller, more detailed clusters, and higher levels aggregating them into coarser groups. There exists a bijection between dendrograms and ultrametric distances: for any pair of elements, the ultrametric is the minimal threshold at which they share a cluster (Soille et al., 2012, Fluck, 2022).
Traditional agglomerative strategies (single, average, complete, Ward) and divisive methods (e.g., spectral or graph-based splits) build the dendrogram by iteratively merging the most similar clusters or splitting along maximal dissimilarities. For spatial data, connectivity and adjacency constraints are enforced to guarantee meaningful segments; in images, this ensures partitions correspond to connected regions.
Key mathematical structures:
- Ultrametric: , satisfies .
- Connectivity function: For graphs or metric data, submodular or maximum-submodular set functions characterize connectivity and govern cluster formation (Fluck, 2022).
- Tangles: Collections of consistently “connected” regions defined via such functions, with tight duality results linking their existence to the ability to decompose the data via branch-width-limited trees (Fluck, 2022).
2. Algorithmic Methodologies in Hierarchical Segmentation and Clustering
Agglomerative Clustering and Linkage Schemes
Agglomerative (bottom-up) methods begin with each data point as a singleton cluster and iteratively merge the closest pair. Merge criteria include average, single-link (min), complete-link (max), and Ward’s minimum-variance criterion. The resulting linkage matrix captures the entire merge sequence and defines the dendrogram (Puentes, 28 Mar 2025, Kharinov, 2014).
In spatial and image data, adjacency graphs restrict merges to neighboring regions. Many algorithms are designed to optimize both geometric closeness (e.g., Euclidean in 3D) and similarity in appearance or features (e.g., color, texture, normal vectors) (Xu et al., 2017, Chen et al., 2021).
Hierarchical Pixel and Region Segmentation
Hierarchical segmentation extends clustering to spatial domains by encoding both pixel/voxel intensity and spatial connectivity. Dynamic trees or priority queues efficiently track merge costs and enforce nestedness of regions (Kharinov, 2014, Chen et al., 2021). For images, piecewise-constant approximations with explicit formulas for merge, split, and reassignment increments in error (e.g., squared error) ensure the hierarchy closely majorizes the global optimum for any number of clusters.
Morphological approaches, such as the ultrametric watershed, use edge-weighted graphs and topological watersheds to construct a single edge map encoding all levels of the hierarchy. Components at a given threshold correspond to the segmentation at that scale, with the entire hierarchy retrievable from the ultrametric saliency map (Soille et al., 2012).
3. Hierarchical Clustering and Segmentation in Modern Deep Architectures
Recent architectures integrate hierarchical clustering concepts directly into neural networks for segmentation and representation learning.
- HCFormer: Integrates attention-based, local hierarchical clustering at each downsampling layer, avoiding classic pixel decoders and enabling visualization of clusterings at multiple resolutions. The final mask head refines top-level cluster prototypes for semantic, instance, and panoptic tasks, providing interpretability (e.g., undersegmentation error measured at each scale) (Suzuki, 2022).
- COCA-Net: Employs stacked Compact Clustering Attention layers using bottom-up grouping by compactness. Each COCA layer produces masks and pooled features representing coherent object or part clusters, recursively grouped to the object level, supporting variable numbers of objects and effective background segmentation (Küçüksözen et al., 4 May 2025).
- Hierarchical Segment Grouping (HSG): For unsupervised semantic segmentation, combines multiview co-segmentation, spatial contrastive learning, and clustering transformer modules across levels. At each layer, a transformer merges fine clusters into coarser ones, enforcing that pixel and cluster embeddings remain group-consistent across the hierarchy (Ke et al., 2022).
3D Point Cloud and Spatiotemporal Data
- HAIS and Part2Object: For 3D instance segmentation, deploy hierarchical clustering in multiple stages (e.g., point to set, set to instance), often guided by objectness priors from multimodal cues (e.g., projected 2D features). Clusters are iteratively merged only if strong affinity and spatial/semantic continuity are present, and aggregation is coupled with refinement via a dedicated mask prediction network (Chen et al., 2021, Shi et al., 2024).
- Hierarchical Vector Quantization (HVQ, HiST-VQ): For temporal segmentation (e.g., unsupervised action discovery in videos or skeleton sequences), a sequence passes through multiple levels of vector quantization—fine subunits (subactions) pooled into coarser actions—jointly adapting codebooks via EMA rules and using commitment/reconstruction losses for stable hierarchical assignments. This two-level structure consistently outperforms flat approaches, particularly in modeling segment-length distributions (Spurio et al., 2024, Ahmed et al., 16 Apr 2026).
4. Statistical and Data-Driven Foundations
Dissimilarity and Statistical Distance
Clustering at each level depends critically on the choice of dissimilarity or affinity. For continuous variables, Euclidean or learned feature space; for qualitative values, Maximum Mean Discrepancy (MMD) between conditional distributions is employed, enabling context-sensitive clustering of categorical variables based on associated quantitative features (Seca et al., 2020). This maintains interpretability for discrete data.
Model Selection, Evaluation, and Interpretability
Methods such as the cophenetic correlation coefficient (quantifying faithfulness to original distances), inconsistency metrics for dendrogram cuts, and cross-validated predictive losses for tree selection (CVL) are standard (Puentes, 28 Mar 2025, Cabezas et al., 2021). Phylogenetic analogies enable feature-importance scoring and visualization of “trait evolution” over the dendrogram, emphasizing the use of the full tree rather than a single partition (Cabezas et al., 2021).
Specialized metrics—such as happiness and immersiveness in biological segmentation—capture both local cluster coherence and global coverage, and the presence of phase transitions in quality metrics as the number of clusters varies guides principled choice of segmentation level (Tjörnhammar, 2024).
5. Scaling, Distributed Computation, and Objectivity Guarantees
Hierarchical clustering/segmentation methods have been adapted for extreme scale:
- Distributed Agglomeration: The computational bottleneck posed by large graphs is overcome by recursively processing data in independently handled chunks, deferring merges at boundaries until all necessary information is available. Provably, if the linkage criterion is reducible (monotonic, e.g., max/mean/median of affinities), the final result is invariant to chunking (Lu et al., 2021).
- Optimal Hierarchical Clustering (OHC) for Point Clouds: The use of assignment problems (e.g., perfect minimum-cost matching via the Kuhn–Munkres algorithm) at each level of the clustering hierarchy enables globally optimal merge decisions for segmentation without the local traps of greedy algorithms (Xu et al., 2017).
Morphological frameworks guarantee ultrametricity and connectedness, and allow area, range, or context-based constraints to be imposed for more robust or meaningful segmentations (Soille et al., 2012).
6. Application Domains and Implications
Hierarchical segmentation and clustering underpin numerous applications:
- Bioinformatics: Revealing gene expression programs and cell-type hierarchies; optimized cluster “cuts” correspond tightly to biological categories, leveraging bespoke metrics for evaluation (Tjörnhammar, 2024).
- Image and 3D Scene Segmentation: Enabling interpretable, multi-scale segmentation for natural images, point clouds, and volumetric biomedical data, with state-of-the-art performance realized in both supervised and unsupervised regimes (Suzuki, 2022, Shi et al., 2024).
- Unsupervised Action Segmentation: Temporal clustering of multivariate sequential data (frames, skeletons) at multiple resolutions, robustly handling intra-class variability and length bias (Spurio et al., 2024, Ahmed et al., 16 Apr 2026).
- Mixed Data and Feature Clustering: Hierarchies applied to categorical data, with cluster similarity defined by MMD over joint covariates, support discovery in diverse settings such as music artist groupings or financial sectors (Seca et al., 2020).
A plausible implication is that, as distributed and deep hierarchical clustering approaches mature, domain-specific constraints (morphological, semantic, or statistical) can be more tightly integrated into the tree-building process, enabling principled, interpretable, and scalable segmentation across modalities.
7. Theoretical Developments and Unifying Frameworks
The duality between tangles and hierarchical clusterings generalizes classical results and establishes that for any (maximum-)submodular connectivity function, tangles precisely recover the clusters at all dendrogram levels, and the absence of high-order tangles is certified by branch decompositions (Fluck, 2022). This underpins the equivalence of dendrograms, ultrametrics, and (maximum-)submodular clustering, providing guarantees of consistency, optimality, and interpretability.
Morphological methods (ultrametric watersheds, constrained connectivity) unify these concepts within an algorithmically efficient, theoretically rigorous structure, empowering large-scale image and spatial data analysis with rigorous control over connectedness, region size, and attribute-based constraints (Soille et al., 2012).
References
- (Kharinov, 2014) Hierarchical pixel clustering for image segmentation
- (Xu et al., 2017) An optimal hierarchical clustering approach to segmentation of mobile LiDAR point clouds
- (Seca et al., 2020) Hierarchical Qualitative Clustering: clustering mixed datasets with critical qualitative information
- (Lu et al., 2021) Large-scale image segmentation based on distributed clustering algorithms
- (Chen et al., 2021) Hierarchical Aggregation for 3D Instance Segmentation
- (Cabezas et al., 2021) Hierarchical clustering: visualization, feature importance and model selection
- (Fluck, 2022) Tangles and Hierarchical Clustering
- (Ke et al., 2022) Unsupervised Hierarchical Semantic Segmentation with Multiview Cosegmentation and Clustering Transformers
- (Suzuki, 2022) HCFormer: Unified Image Segmentation with Hierarchical Clustering
- (Tjörnhammar, 2024) Happy and Immersive Clustering Segmentations of Biological Co-Expression Patterns
- (Shi et al., 2024) Part2Object: Hierarchical Unsupervised 3D Instance Segmentation
- (Spurio et al., 2024) Hierarchical Vector Quantization for Unsupervised Action Segmentation
- (Puentes, 28 Mar 2025) Comparison between neural network clustering, hierarchical clustering and k-means clustering
- (Küçüksözen et al., 4 May 2025) Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning
- (Ahmed et al., 16 Apr 2026) Unsupervised Skeleton-Based Action Segmentation via Hierarchical Spatiotemporal Vector Quantization
- (Soille et al., 2012) On morphological hierarchical representations for image processing and spatial data clustering