Bandit-Based Cluster Sampling
- Bandit-based cluster sampling is a framework that employs multi-armed bandit feedback to adaptively explore and recover latent clustering structures in sequential observation environments.
- It leverages sequential probing with GLR tests and LUCB-style confidence bounds, guided by information-theoretic lower bounds dependent on cluster separations and sizes.
- Applications span online clustering, federated best-arm identification, graph signal recovery, and robotic motion planning, achieving notable sample savings.
Bandit-based cluster sampling encompasses a class of algorithms and theoretical tools for efficient, adaptive identification of group structure in sequential observation environments, where the cost of probing a data source is comparable to multi-armed bandit (MAB) feedback. This paradigm appears across online clustering, sublinear graph signal recovery, pure-exploration federated learning, reward-dependent planning, and active unsupervised learning, with the core methodological challenge being to minimize the sample or communication budget required for discovering latent partitions while satisfying high-probability correctness constraints.
1. Formal Frameworks: Bandit-Based Clustering Paradigms
Bandit-based cluster sampling formulations are typically characterized by:
- Unknown Latent Partition: A set of arms/items/sources, each associated with a (possibly high-dimensional) mean parameter, are grouped into disjoint or overlapping clusters, determined either by equality of means or proximity under a distance metric (Thuot et al., 2024, Yang et al., 2022, Chandran et al., 20 Jan 2025).
- Sequential, Adaptive Probing: At each round, the learner selects a source (and possibly sub-source or feature), obtains a noisy observation, and adaptively updates sampling strategies.
- Fixed-Confidence Objectives: The learner must, with probability at least (“-PAC”), output the true clustering (up to label permutation) or cluster-specific structure (e.g., best arm per cluster) using the fewest samples/arm-pulls (Yang et al., 2022, Yash et al., 15 May 2025, Thuot et al., 2024).
- Information-Theoretic Lower Bounds: The minimal achievable sample complexity is governed by intrinsic cluster separations, dimensionality, and minimal cluster size parameters.
Three canonical models are prominent:
- Bandit Feedback Clustering: Arms indexed by , means , with an unknown partition into clusters; pulling arm yields (Yang et al., 2022, Chandran et al., 20 Jan 2025, Thuot et al., 2024).
- Federated Clustered Bandits: agents assigned to clusters, each cluster with its own i.i.d. bandit problem; assignment unknown to the learner, goal is best-arm identification per agent (Yash et al., 15 May 2025).
- Feature-Selective Bandit Clustering: Items with feature-vectors in partitioned by prototype equality, where at each step the learner chooses item and feature to probe; aim is to identify the partition with minimal queries (Graf et al., 14 Mar 2025).
2. Information-Theoretic Lower Bounds on Sample Complexity
Distinct lower bounds govern the sample budget for bandit-based cluster sampling, typically derived via change-of-measure arguments and minimax rates over covering alternatives:
- Separation- and Cluster-Size-Dependent Bounds: For arms, clusters, minimal inter-cluster center separation , and smallest cluster fraction , the tight expected sample complexity for discovering the partition with error is (Thuot et al., 2024, Yang et al., 2022):
The first term corresponds to the 1d-best-arm budget; the second accounts for high-dimensional discrimination.
- Alternative-Based Characterization: For general mean matrices , sampling proportions , and alternative configurations violating the true clustering , the lower bound is (Yang et al., 2022, Chandran et al., 20 Jan 2025):
- Federated Clustered Bandit Lower Bounds: Pure-exploration best-arm identification with unknown agent-to-bandit assignments incurs a lower bound:
where is a typical arm-gap parameter (Yash et al., 15 May 2025).
3. Principal Algorithmic Approaches
A diversity of algorithmic blueprints has emerged, exploiting both fixed cluster structures and adaptive, feedback-driven strategies.
3.1. Adaptive Cluster-Splitting via GLR/Test Statistics
- BOC/ATBOC: These algorithms maintain plug-in or confidence-based cluster estimates, repeatedly select most-informative arms using convex minimax optimization, and stop with a generalized likelihood ratio (GLR) criterion when discriminatory evidence surpasses an explicit threshold (Yang et al., 2022, Chandran et al., 20 Jan 2025).
- Average-Tracking and D-Tracking: Adaptive sampling ensures the per-arm sample allocation converges to the information-theoretic optimum for distinguishing the true partition from worst-case alternatives (Yang et al., 2022, Chandran et al., 20 Jan 2025).
3.2. Confidence Bound and LUCB-Style Methods
- LUCBBOC: Exploits LUCB-style lower and upper confidence bounds for pairwise mean differences, guiding sampling toward near-ambiguous arm pairs and critical inter/intra-cluster edges. This approach avoids costly global optimization at each step while maintaining -PAC recovery (Chandran et al., 20 Jan 2025).
3.3. Federated and Multi-Agent Clustering Protocols
- Cl-BAI / BAI-Cl: Combine cluster-discovery and best-arm identification via successive elimination. Cl-BAI clusters agents by similarity in empirical means, then identifies cluster-specific best arms; BAI-Cl reverses this sequence for improved communication efficiency under small (Yash et al., 15 May 2025).
- Instance-Optimal BAI-Cl++: Achieves minimax-optimality (up to polylogarithmic factors) in sample complexity as and constant.
3.4. Feature-Selective and Sparsity-Exploiting Methods
- BanditClustering: Leverages sequential halving to adaptively discover discriminative features and outlier representatives, resulting in algorithms with worst-case optimality up to polylogarithmic terms in settings with sparse cluster-inducing features (Graf et al., 14 Mar 2025).
- ACB: Employs sequential search for cluster representatives using unbiased squared-distance tests, then assigns remaining items by adapted nearest-center rules, closing the batch-vs-active computation gap (Thuot et al., 2024).
3.5. Clustered Bandit Regret Minimization
- Clus-UCB: Integrates cluster structure into KL-UCB indices via intra-cluster pooling, matching information-theoretic regret lower bounds under known cluster assignments and widths (Gore et al., 4 Aug 2025).
- Multi-Level (Hierarchical) Thompson Sampling: Implements a layered TS architecture, sampling at cluster and arm levels, yielding regret bounds that shrink with cluster quality and strong dominance (Carlsson et al., 2021).
A summary table of major algorithmic families:
| Algorithm | Model/Setting | Asymptotic Guarantee |
|---|---|---|
| BOC/ATBOC | Online clustering (Gaussian arms) | Tight sample bound |
| LUCBBOC | Same as above (LUCB-style) | Sample complexity ×2 lower bound |
| Cl-BAI/BAI-Cl | Federated (multi-agent) clustering | Minimax-optimal under |
| BanditClustering | Feature-selective item clustering | Optimal up to log-factors |
| ACB | Active -means with bandit | Matches lower bound |
| Clus-UCB, TSC | Bandits with known clusters | Asymptotic regret optimality |
4. Impact of Structure, Dimensionality, and Clustering Objectives
- Intra-Cluster Variation: Algorithms such as ATBOC and LUCBBOC generalize beyond the strictly homogenous cluster model (i.e., need not be identical within each cluster) and provide fixed-confidence recovery even for and multidimensional contexts (Chandran et al., 20 Jan 2025).
- Dimension-Dependent Regimes: A key transition point occurs when , shifting the dominant term in the minimax lower bound from one determined by sample-based thresholding to one governed by high-dimensional geometry (Thuot et al., 2024).
- Sparsity and “Right Feature” Considerations: In high-dimensional clustering with structured differences, e.g., only a small subset of features separating clusters, adaptive feature selection critically improves efficiency (Graf et al., 14 Mar 2025).
5. Applications and Empirical Results
Bandit-based cluster sampling underpins several practical domains:
- Adaptive Sampling for Graph Signals: MAB-based sampling policies, trained via gradient ascent, robustly outperform uniform or random-walk strategies in recovering piecewise-constant signals on graphs (Abramenko et al., 2018).
- Online Market Segmentation and Virus Variant Discovery: Adaptive querying uncovers latent user segments or viral strains using orders-of-magnitude fewer observations than uniform sampling (Yang et al., 2022, Chandran et al., 20 Jan 2025).
- Robotic Motion Planning: Sampling over clustered transition spaces in RRT planners, with rewards estimated via region clustering, produces faster path-cost minimization and higher execution success under uncertainty (Faroni et al., 2023).
- Federated Learning: Clustering agents by empirical reward profiles enables efficient, high-confidence best-arm identification while balancing sample and communication complexity (Yash et al., 15 May 2025).
- MovieLens and Yelp Benchmarks: On real-world recommendation datasets, federated clustered bandit methods yield up to 72% sample savings relative to naive baselines (Yash et al., 15 May 2025, Chandran et al., 20 Jan 2025).
6. Theoretical and Practical Considerations
Several key theoretical and implementation points emerge:
- Active vs. Passive (Batch) Sampling: Active, adaptive querying closes the computation-information gap found in batch clustering; polynomial-time algorithms achieve the information-theoretic minimum under mild assumptions (Thuot et al., 2024).
- Order-Optimality and Instance Adaptivity: ATBOC is order-optimal (factor-2) relative to the minimax lower bound; LUCBBOC matches empirical performance at greatly reduced computational cost (Chandran et al., 20 Jan 2025).
- Scalability: Deployment is efficient—main bottlenecks are K-means-type optimization for cluster center updates (cost per iteration) and small convex subproblems for sample allocation or confidence-bound computation (Yang et al., 2022, Chandran et al., 20 Jan 2025).
- Hyperparameter Selection: Sample allocation, thresholds, and confidence parameters are typically scheduled so as to logarithmically control the failure probability; doubling and sequential halving subroutines provide practical adaptivity to unknown instance parameters (Graf et al., 14 Mar 2025, Thuot et al., 2024).
- Generalizations and Open Problems: Further directions include extensions to unknown (model selection), robustness to heavy-tailed/heteroscedastic noise, hierarchical and overlapping cluster structures, and integration with contextual or non-stationary reward models (Gore et al., 4 Aug 2025, Carlsson et al., 2021, Thuot et al., 2024).