Fixed-Confidence Best Arm Identification

Updated 13 October 2025

The paper introduces the GLUCB algorithm that minimizes sample complexity in fixed-confidence BAI by efficiently identifying the best arm in linear bandits.
It leverages a geometric overlap strategy and confidence ellipsoids to dynamically focus sampling on arms that best differentiate the optimal arm from its competitors.
Empirical and theoretical analyses demonstrate that GLUCB outperforms traditional methods, achieving near-optimal sample bounds in both two- and three-arm cases.

Fixed-confidence best arm identification (BAI) is the problem of adaptively sampling arms in a (structured) bandit model in order to identify the arm with maximal mean, such that the probability of error is at most δ. The central challenge is to minimize the expected number of samples required for this fixed-confidence guarantee. Fixed-confidence BAI has been extended from the classical multi-armed (unstructured) model to a broad variety of classes, with recent work focusing on linearly parameterized bandits. In this setting, each arm is encoded as a known feature vector, the mean reward is a linear function of an unknown parameter, and the best arm is the one maximizing this linear expectation.

1. Problem Formulation: Linear Bandit BAI

In the linear bandit BAI setting, the learner is presented with $K$ arms $x_1, \ldots, x_K \in \mathbb{R}^d$ . The unknown parameter $\theta^* \in \mathbb{R}^d$ determines the mean reward for arm $a$ as $x_a^\top \theta^*$ . At each round $t$ , the learner chooses an arm $x_{a_t}$ , observes reward $y_t = x_{a_t}^\top \theta^* + \xi_t$ (with $\xi_t$ sub-Gaussian), and, after as few rounds as possible, outputs an arm $a^\dagger$ such that

$\mathbb{P}_{\theta^*}(x_{a^\dagger}^\top \theta^* < \max_{b} x_b^\top \theta^*) \leq \delta,$

while minimizing $\mathbb{E}[\tau]$ , the expected stopping time.

The linear structure implies strong correlations between arms: pulling one arm informs the means of others. In contrast to classical settings, the challenge is to design adaptive strategies that “probe” the parameter space efficiently.

2. GLUCB Algorithm Architecture and Geometric Overlap

The GLUCB (“Generalized Lower Upper Confidence Bounds”) algorithm generalizes the classic LUCB of Kalyanakrishnan et al. to the linear structure. Its sequential procedure comprises:

Empirical Estimation: After $t$ pulls, compute the regularized least-squares estimate

$\theta_t = V_t^{-1} b_t, \qquad V_t = \lambda I + \sum_{s=1}^t x_{a_s} x_{a_s}^\top, \qquad b_t = \sum_{s=1}^t x_{a_s} y_s$

Confidence Ellipsoid: Construct a set

$\mathcal{C}_t = \left\{ \theta \in \mathbb{R}^d : \|\theta - \theta_t\|_{V_t} \leq \beta_t \right\}$

with $\beta_t = R \sqrt{d \ln(t/\delta)}$ for sub-Gaussian noise of size $R$ .

Best Arm and Advantage: Predict the best arm

$h_t = \arg\max_{a} (\theta_t^\top x_a)$

and, for every $a \neq h_t$ , compute the “advantage”

$\mathrm{Adv}(a) = \max_{\theta \in \mathcal{C}_t} [\theta^\top x_a - \theta^\top x_{h_t}]$

Stopping Rule: Terminate when $\forall a \neq h_t, \;\mathrm{Adv}(a) \leq 0$ . This occurs precisely when $\mathcal{C}_t$ is fully contained in the cone $R(x_{h_t})$ where $x_{h_t}$ is optimal.
Sampling Rule (Geometric Overlap): Unlike LUCB, which samples the arm with the second-highest upper confidence, GLUCB chooses the arm $a_{t+1}$ that most reduces the “geometric overlap” of $\mathcal{C}_t$ with $R(x_{h_t})^c$ (non-optimality). This is formalized:

$a_{t+1} \in \arg\max_{a} \frac{|x_a^\top V_t^{-1} (x_{h_t} - x_{l_t})|}{\sqrt{1 + x_a^\top V_t^{-1} x_a}}$

where $x_{l_t}$ is the arm (other than $h_t$ ) of maximal advantage.

The geometric overlap encodes the intersection of the ellipsoid $\mathcal{C}_t$ with the complement of the cone corresponding to the current best arm; minimization of this overlap focuses sampling on arms whose feature directions best distinguish $h_t$ from its main competitors.

3. Adaptive and Computationally Efficient Design

GLUCB’s adaptivity arises from updating $\theta_t$ and the confidence set $\mathcal{C}_t$ at each round and dynamically allocating samples where they maximally reduce “worst-case possibility” of misidentification. Each decision step leverages:

Efficient updates of $V_t$ and $V_t^{-1}$ using rank-one formulas;
Simple, closed-form calculations for the advantage and geometric overlap criteria;
Selection of sampling actions determined only by current empirical sufficient statistics.

Compared to algorithms that solve complex instance-dependent optimizations (e.g., LinGapE, X–ElimTil–p), GLUCB’s per-round computational cost is limited to basic matrix–vector calculations, making it scalable for high $K$ and $d$ .

4. Sample Complexity in Two- and Three-Arm Cases

The paper presents explicit theoretical guarantees for $K=2$ and $K=3$ , illuminating the efficacy of the geometric approach:

Two-arm case

For $x_1, x_2 \in \mathbb{R}^d$ ,

Up to rounding, GLUCB alternates between arms, ensuring near-perfect balance:

$\left\lfloor \frac{t}{2} \right\rfloor \leq n_k(t) \leq \left\lfloor \frac{t}{2} \right\rfloor + 1$

The “potential function” $\Phi(t) = (x_1 - x_2)^\top V_t^{-1} (x_1 - x_2)$ quantifies uncertainty in the direction of interest and is shown to decrease at least as quickly as any alternative policy.
The expected sample complexity (omitting logarithmic factors) is

$\mathbb{E}[\tau] \leq \beta_t^2 H_G + 1$

with $H_G$ arising as an instance-dependent term from a sampling frequency optimization, matching the lower bound up to dimension-dependent constants.

Three-arm case

When, for instance, $x_3$ is “geometrically” dominated (linear combination of $x_1, x_2$ with a small angle), GLUCB rapidly eliminates dominated arms and focuses sampling only where it is effective for identification. The upper bound becomes

$O\left(\frac{\beta_t^2}{\Delta_{\min}^2} \sin^2(\omega) + \frac{\beta_t^2}{\Delta_{\min} \sin(\omega)}\right)$

where $\omega$ is the angular separation and $\Delta_{\min}$ the smallest gap, validating order-optimality except for log/dimensional slack.

5. Empirical Validation and Advantage over State of the Art

Extensive experiments confirm theoretical findings:

For synthetic arms (random or with ambiguous/dominated arms), GLUCB requires fewer samples—often by an order of magnitude—compared to classical LUCB, LinGapE, XY–static, and X–ElimTil–p.
In large diagonal-structured instances ( $K \sim 10^4$ ), GLUCB scales efficiently while maintaining sample complexity advantages.
On the Yahoo! Webscope dataset (structured real-world arms), GLUCB achieves the fixed confidence guarantee with observed error probability zero across all trials, and consistently matches or outperforms the next-best methods in terms of sample efficiency.

The central qualitative finding is that by exploiting the geometry of the linear model, GLUCB “knows” to cease sampling arms whose differences have already been determined with high statistical power, thus avoiding the waste incurred by unstructured strategies.

6. Mathematical Expressions and Structural Insights

Core formulations underlying GLUCB’s design include:

Confidence ellipsoid:

$\mathcal{C}_t = \left\{ \theta \in \mathbb{R}^d : \|\theta - \theta_t\|_{V_t} \leq \beta_t \right\}$

Advantage of arm $a$ :

$\textrm{Adv}(a) = \max_{\theta \in \mathcal{C}_t} [\theta^\top x_a - \theta^\top x_{h_t}]$

Geometric overlap-driven sampling:

$a_{t+1} \in \arg\max_a \frac{|x_a^\top V_t^{-1} (x_{h_t} - x_{l_t})|}{\sqrt{1 + x_a^\top V_t^{-1} x_a}}$

Potential function for two-arm case:

$\Phi(t) = (x_1 - x_2)^\top V_t^{-1} (x_1 - x_2)$

Sample complexity bounds:

$O\left( \frac{\beta_t^2}{\Delta_{\min}^2} \sin^2(\omega)+ \frac{\beta_t^2}{\Delta_{\min} \sin(\omega)} \right)$

The use of ellipsoids, cones, and Mahalanobis norms as the fundamental devices for uncertainty quantification and sampling allocation marks a distinct departure from naïve UCB approaches.

7. Significance and Implications

GLUCB provides a principled, fully adaptive, and computationally simple solution for fixed-confidence BAI in linear bandits, with near-optimal sample complexity and practical performance gains over previous art. The geometric overlap framework exposes the key directions to probe in parameter space, leading to immediate and effective reduction in ambiguity about which arm is best, and automatically adapts sampling to the most informative arms. This approach provides a general template for extending “structure-aware” BAI strategies in linearly parameterized or more broadly structured bandit models.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to Fixed-Confidence Best Arm Identification (BAI).