Papers
Topics
Authors
Recent
2000 character limit reached

Bandit-Based Cluster Sampling

Updated 5 January 2026
  • Bandit-based cluster sampling is a framework that employs multi-armed bandit feedback to adaptively explore and recover latent clustering structures in sequential observation environments.
  • It leverages sequential probing with GLR tests and LUCB-style confidence bounds, guided by information-theoretic lower bounds dependent on cluster separations and sizes.
  • Applications span online clustering, federated best-arm identification, graph signal recovery, and robotic motion planning, achieving notable sample savings.

Bandit-based cluster sampling encompasses a class of algorithms and theoretical tools for efficient, adaptive identification of group structure in sequential observation environments, where the cost of probing a data source is comparable to multi-armed bandit (MAB) feedback. This paradigm appears across online clustering, sublinear graph signal recovery, pure-exploration federated learning, reward-dependent planning, and active unsupervised learning, with the core methodological challenge being to minimize the sample or communication budget required for discovering latent partitions while satisfying high-probability correctness constraints.

1. Formal Frameworks: Bandit-Based Clustering Paradigms

Bandit-based cluster sampling formulations are typically characterized by:

  • Unknown Latent Partition: A set of arms/items/sources, each associated with a (possibly high-dimensional) mean parameter, are grouped into disjoint or overlapping clusters, determined either by equality of means or proximity under a distance metric (Thuot et al., 2024, Yang et al., 2022, Chandran et al., 20 Jan 2025).
  • Sequential, Adaptive Probing: At each round, the learner selects a source (and possibly sub-source or feature), obtains a noisy observation, and adaptively updates sampling strategies.
  • Fixed-Confidence Objectives: The learner must, with probability at least 1δ1-\delta (“δ\delta-PAC”), output the true clustering (up to label permutation) or cluster-specific structure (e.g., best arm per cluster) using the fewest samples/arm-pulls (Yang et al., 2022, Yash et al., 15 May 2025, Thuot et al., 2024).
  • Information-Theoretic Lower Bounds: The minimal achievable sample complexity is governed by intrinsic cluster separations, dimensionality, and minimal cluster size parameters.

Three canonical models are prominent:

  • Bandit Feedback Clustering: Arms indexed by m=1,,Mm=1,\dots,M, means μmRd\mu_m\in \mathbb{R}^d, with an unknown partition into KK clusters; pulling arm mm yields XN(μm,Id)X\sim \mathcal{N}(\mu_m, I_d) (Yang et al., 2022, Chandran et al., 20 Jan 2025, Thuot et al., 2024).
  • Federated Clustered Bandits: NN agents assigned to MM clusters, each cluster with its own i.i.d. bandit problem; assignment unknown to the learner, goal is best-arm identification per agent (Yash et al., 15 May 2025).
  • Feature-Selective Bandit Clustering: Items with feature-vectors in Rd\mathbb{R}^d partitioned by prototype equality, where at each step the learner chooses item and feature to probe; aim is to identify the partition with minimal queries (Graf et al., 14 Mar 2025).

2. Information-Theoretic Lower Bounds on Sample Complexity

Distinct lower bounds govern the sample budget for bandit-based cluster sampling, typically derived via change-of-measure arguments and minimax rates over covering alternatives:

  • Separation- and Cluster-Size-Dependent Bounds: For MM arms, KK clusters, minimal inter-cluster center separation Δ\Delta_*, and smallest cluster fraction θ\theta_*, the tight expected sample complexity for discovering the partition with error δ\leq \delta is (Thuot et al., 2024, Yang et al., 2022):

TN+σ2Δ2[NlnNδ+dKNlnNδ]T^* \gtrsim N + \frac{\sigma^2}{\Delta_*^2} \left[ N \ln\frac{N}{\delta} + \sqrt{dKN \ln \frac{N}{\delta}} \right]

The first term corresponds to the 1d-best-arm budget; the second accounts for high-dimensional discrimination.

  • Alternative-Based Characterization: For general mean matrices μ\mu, sampling proportions wΔMw\in \Delta_M, and alternative configurations λ\lambda violating the true clustering cc, the lower bound is (Yang et al., 2022, Chandran et al., 20 Jan 2025):

T(μ)1=supwΔMinfλAlt(μ)12m=1Mwmλmμm2T^*(\mu)^{-1} = \sup_{w\in\Delta_M} \inf_{\lambda\in \text{Alt}(\mu)} \frac{1}{2} \sum_{m=1}^M w_m \|\lambda_m - \mu_m\|^2

  • Federated Clustered Bandit Lower Bounds: Pure-exploration best-arm identification with unknown agent-to-bandit assignments incurs a lower bound:

E[T]max{M(KM),N}log(1/δ)Δ2\mathbb{E}[T] \gtrsim \max\{ M(K-M), N \} \cdot \frac{\log(1/\delta)}{\Delta^2}

where Δ\Delta is a typical arm-gap parameter (Yash et al., 15 May 2025).

3. Principal Algorithmic Approaches

A diversity of algorithmic blueprints has emerged, exploiting both fixed cluster structures and adaptive, feedback-driven strategies.

3.1. Adaptive Cluster-Splitting via GLR/Test Statistics

  • BOC/ATBOC: These algorithms maintain plug-in or confidence-based cluster estimates, repeatedly select most-informative arms using convex minimax optimization, and stop with a generalized likelihood ratio (GLR) criterion when discriminatory evidence surpasses an explicit threshold (Yang et al., 2022, Chandran et al., 20 Jan 2025).
  • Average-Tracking and D-Tracking: Adaptive sampling ensures the per-arm sample allocation converges to the information-theoretic optimum for distinguishing the true partition from worst-case alternatives (Yang et al., 2022, Chandran et al., 20 Jan 2025).

3.2. Confidence Bound and LUCB-Style Methods

  • LUCBBOC: Exploits LUCB-style lower and upper confidence bounds for pairwise mean differences, guiding sampling toward near-ambiguous arm pairs and critical inter/intra-cluster edges. This approach avoids costly global optimization at each step while maintaining δ\delta-PAC recovery (Chandran et al., 20 Jan 2025).

3.3. Federated and Multi-Agent Clustering Protocols

  • Cl-BAI / BAI-Cl: Combine cluster-discovery and best-arm identification via successive elimination. Cl-BAI clusters agents by similarity in empirical means, then identifies cluster-specific best arms; BAI-Cl reverses this sequence for improved communication efficiency under small MM (Yash et al., 15 May 2025).
  • Instance-Optimal BAI-Cl++: Achieves minimax-optimality (up to polylogarithmic factors) in sample complexity as NN\rightarrow\infty and MM constant.

3.4. Feature-Selective and Sparsity-Exploiting Methods

  • BanditClustering: Leverages sequential halving to adaptively discover discriminative features and outlier representatives, resulting in algorithms with worst-case optimality up to polylogarithmic terms in settings with sparse cluster-inducing features (Graf et al., 14 Mar 2025).
  • ACB: Employs sequential search for cluster representatives using unbiased squared-distance tests, then assigns remaining items by adapted nearest-center rules, closing the batch-vs-active computation gap (Thuot et al., 2024).

3.5. Clustered Bandit Regret Minimization

  • Clus-UCB: Integrates cluster structure into KL-UCB indices via intra-cluster pooling, matching information-theoretic regret lower bounds under known cluster assignments and widths (Gore et al., 4 Aug 2025).
  • Multi-Level (Hierarchical) Thompson Sampling: Implements a layered TS architecture, sampling at cluster and arm levels, yielding regret bounds that shrink with cluster quality and strong dominance (Carlsson et al., 2021).

A summary table of major algorithmic families:

Algorithm Model/Setting Asymptotic Guarantee
BOC/ATBOC Online clustering (Gaussian arms) Tight log(1/δ)\log(1/\delta) sample bound
LUCBBOC Same as above (LUCB-style) Sample complexity ×2 lower bound
Cl-BAI/BAI-Cl Federated (multi-agent) clustering Minimax-optimal under MNM \ll N
BanditClustering Feature-selective item clustering Optimal up to log-factors
ACB Active KK-means with bandit Matches lower bound
Clus-UCB, TSC Bandits with known clusters Asymptotic regret optimality

4. Impact of Structure, Dimensionality, and Clustering Objectives

  • Intra-Cluster Variation: Algorithms such as ATBOC and LUCBBOC generalize beyond the strictly homogenous cluster model (i.e., μi\mu_i need not be identical within each cluster) and provide fixed-confidence recovery even for K>2K>2 and multidimensional contexts (Chandran et al., 20 Jan 2025).
  • Dimension-Dependent Regimes: A key transition point occurs when dNln(N/δ)/Kd \sim N \ln(N/\delta)/K, shifting the dominant term in the minimax lower bound from one determined by sample-based thresholding to one governed by high-dimensional geometry (Thuot et al., 2024).
  • Sparsity and “Right Feature” Considerations: In high-dimensional clustering with structured differences, e.g., only a small subset of features separating clusters, adaptive feature selection critically improves efficiency (Graf et al., 14 Mar 2025).

5. Applications and Empirical Results

Bandit-based cluster sampling underpins several practical domains:

  • Adaptive Sampling for Graph Signals: MAB-based sampling policies, trained via gradient ascent, robustly outperform uniform or random-walk strategies in recovering piecewise-constant signals on graphs (Abramenko et al., 2018).
  • Online Market Segmentation and Virus Variant Discovery: Adaptive querying uncovers latent user segments or viral strains using orders-of-magnitude fewer observations than uniform sampling (Yang et al., 2022, Chandran et al., 20 Jan 2025).
  • Robotic Motion Planning: Sampling over clustered transition spaces in RRT planners, with rewards estimated via region clustering, produces faster path-cost minimization and higher execution success under uncertainty (Faroni et al., 2023).
  • Federated Learning: Clustering agents by empirical reward profiles enables efficient, high-confidence best-arm identification while balancing sample and communication complexity (Yash et al., 15 May 2025).
  • MovieLens and Yelp Benchmarks: On real-world recommendation datasets, federated clustered bandit methods yield up to 72% sample savings relative to naive baselines (Yash et al., 15 May 2025, Chandran et al., 20 Jan 2025).

6. Theoretical and Practical Considerations

Several key theoretical and implementation points emerge:

  • Active vs. Passive (Batch) Sampling: Active, adaptive querying closes the computation-information gap found in batch clustering; polynomial-time algorithms achieve the information-theoretic minimum under mild assumptions (Thuot et al., 2024).
  • Order-Optimality and Instance Adaptivity: ATBOC is order-optimal (factor-2) relative to the minimax lower bound; LUCBBOC matches empirical performance at greatly reduced computational cost (Chandran et al., 20 Jan 2025).
  • Scalability: Deployment is efficient—main bottlenecks are K-means-type optimization for cluster center updates (cost O(MKd)O(M K d) per iteration) and small convex subproblems for sample allocation or confidence-bound computation (Yang et al., 2022, Chandran et al., 20 Jan 2025).
  • Hyperparameter Selection: Sample allocation, thresholds, and confidence parameters are typically scheduled so as to logarithmically control the failure probability; doubling and sequential halving subroutines provide practical adaptivity to unknown instance parameters (Graf et al., 14 Mar 2025, Thuot et al., 2024).
  • Generalizations and Open Problems: Further directions include extensions to unknown KK (model selection), robustness to heavy-tailed/heteroscedastic noise, hierarchical and overlapping cluster structures, and integration with contextual or non-stationary reward models (Gore et al., 4 Aug 2025, Carlsson et al., 2021, Thuot et al., 2024).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Bandit-Based Cluster Sampling.