Iterative Clustering & Bootstrapping

Updated 3 March 2026

Iterative clustering and bootstrapping is an unsupervised learning approach that refines cluster assignments through repeated resampling for enhanced stability.
It employs bootstrap sampling to reduce variance and sensitivity to outliers, resulting in more robust and reliable clustering results.
By alternating between model re-estimation and consensus formation, the method improves uncertainty quantification and convergence in high-dimensional settings.

Iterative clustering and bootstrapping refer to a class of methodologies in unsupervised learning where cluster assignments and underlying parameters are repeatedly refined using both clustering and resampling strategies. These methods are motivated by the need to improve robustness, stability, and accuracy compared to single-pass clustering algorithms, particularly in high-dimensional, noisy, or weakly-supervised contexts. The bootstrapping component, realized through repeated sampling (with or without weighting), serves to quantify uncertainty, enhance outlier robustness, or generate stronger pseudo-labels. The iterative aspect typically involves repeated refinement—either of cluster memberships, model parameters, or learned representations—until convergence or sufficient stabilization is attained.

1. Fundamental Principles and Motivations

Iterative clustering techniques exploit the fact that a single run of clustering (e.g., $k$ -means, hierarchical, spectral, or GMM-based) can be highly sensitive to initialization, noise, or the presence of outliers. By alternating between cluster assignment and model re-estimation, these methods aim to reach more stable local optima. Bootstrapping, originated in statistics as a tool for robust estimation and uncertainty quantification, underpins ensemble-based cluster stabilization and the estimation of the variability of clustering solutions.

Several distinct motivations underlie the deployment of iterative clustering and bootstrapping:

Variance Reduction: Averaging over multiple bootstrap replicates produces more stable, less variable clustering results than single-run solutions.
Robustness to Outliers: Bootstrap sampling in tandem with robust statistics (e.g., median-of-means) raises the breakdown point relative to traditional clustering (2002.03899).
Uncertainty Quantification: Resampling-based consensus (e.g., via co-assignment matrices or entropy measures) provides direct assessments of cluster assignment confidence.
Adaptivity and Refinement: Iterative bootstrapping enables dynamic refinement of pseudo-labels or mixture weights, especially for weakly supervised and representation learning scenarios (&&&1&&&, Hou et al., 2023).

2. Core Algorithmic Frameworks

A diverse array of iterative clustering and bootstrapping paradigms have emerged, often aligned with domain and problem constraints:

a. Bootstrap-Driven Consensus and Stability

Algorithms such as Spectral-BootEM and BootSpectral combine spectral embedding with repeated sampling of the latent feature space, running an EM fit for each bootstrap and averaging the resulting mixture parameters. This approach mitigates sensitivity to initialization and reduces overfitting (Welsh et al., 2022). The generic scheme involves:

Spectral embedding: SVD or affinity-based projection to a low-dimensional space;
Bootstrap sampling: Multiple resamplings of data or feature rows;
Iterative EM: Cluster membership and parameter updates per bootstrap;
Consensus: Averaging parameters or cluster assignments for smoothed, stable outputs.

b. Ensemble Hierarchical Clustering via Boosting

Hierarchical clustering ensemble frameworks build multiple dendrograms over bootstrapped or weighted samples and fuse them via consensus of cophenetic distance matrices (Rashedi et al., 2018). Boosting elements enter via dynamic reweighting—data points poorly fit by the current ensemble are up-weighted in subsequent rounds, analogous to AdaBoost:

Weighted resampling;
Hierarchical dendrogram construction;
Boosted value assessment and weight updating;
Aggregation (simple average, weighted sum, or entropy-based combination);
Final consensus dendrogram recovery.

c. Robust Lloyd-Type Algorithms (Bootstrap Median-of-Means)

The K-bMOM algorithm replaces EM mean computation with bootstrap median-of-means estimators at each Lloyd iteration, dramatically increasing the breakdown point. At each iteration:

Multiple bootstrap samples ("blocks") are drawn from the data;
Cluster assignments are made for each block;
Centroids are recomputed via (coordinatewise/vector) median-of-means across blocks (2002.03899);
The median block by within-cluster distortion is selected for centroid update.

d. Self-training, Contrastive, and Classification-Enhanced Loops

Deep clustering advances such as CEIL and contrastive bootstrapping schemes operate in a fully iterative loop:

Feature extraction;
Clustering and pseudo-label generation (possibly via contrastive and cluster-disentangling objectives);
Bootstrapped re-labelling: pseudo-labels generated by high-confidence assignments on strongly-supported clusters (Hou et al., 2023, Zhao et al., 2023);
Supervised updating of the feature extractor / encoder with new pseudo-labels (e.g., by prompt-based masked-language modeling or prototype learning);
Iteration until performance plateaus.

e. Application-Guided Iterative Clustering

Seeded Iterative Clustering (SIC) leverages sparse user-annotated seed points to initialize cluster centroids, using iterative restriction to subclusters for progressive segmentation, with iterative self-training driven by cluster confidence on seeds (Chelebian et al., 2022).

f. Bayesian and Variational Bootstrap Techniques

Bootstrapping is also integrated into Bayesian nonparametric clustering via the linear bootstrap: a first-order sensitivity analysis of the variational posterior approximates the impact of resampling on cluster assignments, capturing local stability at a fraction of the computational cost (Giordano et al., 2017).

3. Mathematical Formulation and Consensus Construction

Iterative clustering and bootstrapping frameworks generally rely on one or more of the following mathematical pillars:

Bootstrap sampling: $M$ samples drawn with replacement; weights from the multinomial or, in Bayesian variants, the Dirichlet.
Consensus assignment: Co-assignment matrices, averaged parameters, or entropy-based summaries:
- Consensus dissimilarity: $D^*_{ij} = \frac{1}{T} \sum_{t=1}^T D^{(t)}_{ij}$ (Rashedi et al., 2018);
- Block-median selection: $\hat{\mu}_{\mathrm{bMOM}} = \mathrm{median}\{\bar{x}_{(1)}, \dots, \bar{x}_{(B)}\}$ (2002.03899);
- Shannon entropy for uncertainty: $H(C) = -\sum_j p_j \log p_j$ for cluster $C$ (Quetti et al., 2024).
Pseudo-label selection: Confidence- or distinction-gap-based filtering; only high-scoring assignments are retained for supervision on future rounds (Hou et al., 2023, Zhao et al., 2023).
Weight updating schemes: For boosting, weights are increased for points with low cophenetic fit or clustering agreement (Rashedi et al., 2018).
Linear bootstrap sensitivity: $S = -[\nabla^2_{\eta\eta} KL(\eta,W)]^{-1} [\nabla^2_{\eta W} KL(\eta,W)]$ (Giordano et al., 2017).

Consensus approaches serve not only to stabilize output, but also to quantify the intrinsic ambiguity or instability present in the partitioning induced by the clustering mechanism.

4. Robustness, Uncertainty, and Stability Properties

Robustness to noise, outliers, and data perturbations is a central justification for iterative clustering and bootstrapping. Empirical and theoretical results include:

Breakdown point improvement: Bootstrap median-of-means achieves $\lim_{B \to \infty} \Delta_{\mathrm{bMOM}} = 1 - 2^{-1/n_B}$ , exceeding that of classical MOM and becoming arbitrarily close to 1 for moderate block sizes (2002.03899).
Model stability: Bootstrap ensembles and linear bootstrap techniques provide direct quantification of how cluster assignments change under infinitesimal or finite reweightings, supporting principled assessment of cluster reliability (Giordano et al., 2017).
Outlier resilience: K-bMOM retains superior clustering accuracy (ARI $>$ 0.8–0.98) with adversarial contamination up to 3–5% of data points, outperforming trimmed $k$ -means, $k$ -medoids, and related approaches (2002.03899).
Uncertainty estimation: Entropy-based cluster assignment metrics and co-assignment probability maps allow for explicit uncertainty reporting, aiding both post-hoc analysis and cluster selection (Quetti et al., 2024, Giordano et al., 2017).

5. Practical Implementations and Computational Analysis

Efficient implementation of iterative clustering with bootstrapping is dependent on communication-efficient consensus construction and, where feasible, dimension reduction:

Method	Key Computational Steps	Dominant Complexity Term
Spectral-BootEM	SVD, bootstrap EM, param avg	$O(np^2 + M n G^3)$
HBoosting (hierarchical)	Subsample, dendrogram, aggregation	$O(T m^2 + T N^2 + N^2)$
K-bMOM	B blocks/iteration, block assignment	$O(B n_B d K)$ per iteration
CEIL/Contrastive	Cluster, filter, classifier update	Iterative deep model training
Linear Bootstrap	Single MFVB solve, matrix multiplies	Essentially negligible (<0.001s/replicate)

Batch parallelization is standard for bootstrap replicates or blockwise computation, leveraging available CPU/GPU hardware.
Stopping criteria often rely on the stabilization of consensus assignments or convergence in objective value (e.g., normalized parameter change $R^{(k)}<\epsilon_B$ ).
Parameter tuning for block sizes $n_B$ , number of bootstraps $B$ , and resampling fractions are typically selected by sensitivity plots (“elbow” heuristic) or determined empirically.

Empirical results across text, imaging, and generic tabular data demonstrate convergence of cluster accuracy and stability in under 5–10 iterations, with running times reduced by two orders of magnitude in the linear bootstrap relative to repeated variational inference (Giordano et al., 2017, Chelebian et al., 2022).

6. Applications and Domain-Specific Adaptations

Iterative clustering and bootstrapping support a spectrum of domains:

Text clustering and representation learning: CEIL leverages iterative classification-enhanced representation updating, using filtered cluster-based pseudo-labels for prompt-tuning of LLMs (Zhao et al., 2023).
Label refinement in hierarchical classification: Contrastive bootstrapping refines text passage labels in settings with only coarse annotations, employing prototype-based clustering iteratively updated by high-confidence pseudo-labels (Hou et al., 2023).
Segmentation and region identification: Seeded iterative clustering for histology relies on patch-level embeddings and user-placed seed points, iteratively refining cluster assignments without any retraining of the underlying network (Chelebian et al., 2022).
Mixture optimization in large-scale pretraining: The CLIMB framework clusters corpora, learns data-mixture weights via iterative predictor-guided bootstrapping, and achieves superior transfer performance on reasoning tasks (Diao et al., 17 Apr 2025).
Robust color quantization, biological data clustering: K-bMOM and spectral-bootstrap methods achieve state-of-the-art MSE/ARI in image quantization and cluster assignment in single-cell or gene expression data (2002.03899, Welsh et al., 2022).

7. Limitations, Open Challenges, and Outlook

Despite their demonstrated strengths, iterative clustering and bootstrapping face several challenges:

Computational scaling: Naive bootstrapping (full or cold-start) can be prohibitively expensive for large $n,p$ ; approximations such as the linear bootstrap or spectral dimensionality reduction offer substantial savings but require careful scrutiny regarding approximation quality (Giordano et al., 2017, Welsh et al., 2022).
Hyperparameter selection: Assigning block sizes, the number of bootstraps, or aggregation thresholds often resorts to empirical heuristics; automated selection remains a topic of ongoing research (2002.03899, Zhao et al., 2023).
Global vs local minima: While bootstrapping helps explore basins of attraction in the clustering landscape, strong dependence on the initial embedding or feature extractor can persist.
Interpretability: Ensemble and probabilistic consensus outputs yield more nuanced assignment matrices, but their interpretability in downstream tasks (e.g. assigning definitive group membership for intervention) may require additional post-processing.

Future avenues include improved theoretical analysis of convergence rates and stability bounds under high-dimensional and non-exchangeable noise regimes, integration of multi-modal data, and the development of domain-agnostic consensus frameworks that combine the strengths of bootstrapping and meta-learning for unsupervised partitioning.