Fixed-Confidence Best Arm Identification
- The paper introduces the GLUCB algorithm that minimizes sample complexity in fixed-confidence BAI by efficiently identifying the best arm in linear bandits.
- It leverages a geometric overlap strategy and confidence ellipsoids to dynamically focus sampling on arms that best differentiate the optimal arm from its competitors.
- Empirical and theoretical analyses demonstrate that GLUCB outperforms traditional methods, achieving near-optimal sample bounds in both two- and three-arm cases.
Fixed-confidence best arm identification (BAI) is the problem of adaptively sampling arms in a (structured) bandit model in order to identify the arm with maximal mean, such that the probability of error is at most δ. The central challenge is to minimize the expected number of samples required for this fixed-confidence guarantee. Fixed-confidence BAI has been extended from the classical multi-armed (unstructured) model to a broad variety of classes, with recent work focusing on linearly parameterized bandits. In this setting, each arm is encoded as a known feature vector, the mean reward is a linear function of an unknown parameter, and the best arm is the one maximizing this linear expectation.
1. Problem Formulation: Linear Bandit BAI
In the linear bandit BAI setting, the learner is presented with arms . The unknown parameter determines the mean reward for arm as . At each round , the learner chooses an arm , observes reward (with sub-Gaussian), and, after as few rounds as possible, outputs an arm such that
while minimizing , the expected stopping time.
The linear structure implies strong correlations between arms: pulling one arm informs the means of others. In contrast to classical settings, the challenge is to design adaptive strategies that “probe” the parameter space efficiently.
2. GLUCB Algorithm Architecture and Geometric Overlap
The GLUCB (“Generalized Lower Upper Confidence Bounds”) algorithm generalizes the classic LUCB of Kalyanakrishnan et al. to the linear structure. Its sequential procedure comprises:
- Empirical Estimation: After pulls, compute the regularized least-squares estimate
- Confidence Ellipsoid: Construct a set
with for sub-Gaussian noise of size .
- Best Arm and Advantage: Predict the best arm
and, for every , compute the “advantage”
- Stopping Rule: Terminate when . This occurs precisely when is fully contained in the cone where is optimal.
- Sampling Rule (Geometric Overlap): Unlike LUCB, which samples the arm with the second-highest upper confidence, GLUCB chooses the arm that most reduces the “geometric overlap” of with (non-optimality). This is formalized:
where is the arm (other than ) of maximal advantage.
The geometric overlap encodes the intersection of the ellipsoid with the complement of the cone corresponding to the current best arm; minimization of this overlap focuses sampling on arms whose feature directions best distinguish from its main competitors.
3. Adaptive and Computationally Efficient Design
GLUCB’s adaptivity arises from updating and the confidence set at each round and dynamically allocating samples where they maximally reduce “worst-case possibility” of misidentification. Each decision step leverages:
- Efficient updates of and using rank-one formulas;
- Simple, closed-form calculations for the advantage and geometric overlap criteria;
- Selection of sampling actions determined only by current empirical sufficient statistics.
Compared to algorithms that solve complex instance-dependent optimizations (e.g., LinGapE, X–ElimTil–p), GLUCB’s per-round computational cost is limited to basic matrix–vector calculations, making it scalable for high and .
4. Sample Complexity in Two- and Three-Arm Cases
The paper presents explicit theoretical guarantees for and , illuminating the efficacy of the geometric approach:
Two-arm case
For ,
- Up to rounding, GLUCB alternates between arms, ensuring near-perfect balance:
- The “potential function” quantifies uncertainty in the direction of interest and is shown to decrease at least as quickly as any alternative policy.
- The expected sample complexity (omitting logarithmic factors) is
with arising as an instance-dependent term from a sampling frequency optimization, matching the lower bound up to dimension-dependent constants.
Three-arm case
When, for instance, is “geometrically” dominated (linear combination of with a small angle), GLUCB rapidly eliminates dominated arms and focuses sampling only where it is effective for identification. The upper bound becomes
where is the angular separation and the smallest gap, validating order-optimality except for log/dimensional slack.
5. Empirical Validation and Advantage over State of the Art
Extensive experiments confirm theoretical findings:
- For synthetic arms (random or with ambiguous/dominated arms), GLUCB requires fewer samples—often by an order of magnitude—compared to classical LUCB, LinGapE, XY–static, and X–ElimTil–p.
- In large diagonal-structured instances (), GLUCB scales efficiently while maintaining sample complexity advantages.
- On the Yahoo! Webscope dataset (structured real-world arms), GLUCB achieves the fixed confidence guarantee with observed error probability zero across all trials, and consistently matches or outperforms the next-best methods in terms of sample efficiency.
The central qualitative finding is that by exploiting the geometry of the linear model, GLUCB “knows” to cease sampling arms whose differences have already been determined with high statistical power, thus avoiding the waste incurred by unstructured strategies.
6. Mathematical Expressions and Structural Insights
Core formulations underlying GLUCB’s design include:
- Confidence ellipsoid:
- Advantage of arm :
- Geometric overlap-driven sampling:
- Potential function for two-arm case:
- Sample complexity bounds:
The use of ellipsoids, cones, and Mahalanobis norms as the fundamental devices for uncertainty quantification and sampling allocation marks a distinct departure from naïve UCB approaches.
7. Significance and Implications
GLUCB provides a principled, fully adaptive, and computationally simple solution for fixed-confidence BAI in linear bandits, with near-optimal sample complexity and practical performance gains over previous art. The geometric overlap framework exposes the key directions to probe in parameter space, leading to immediate and effective reduction in ambiguity about which arm is best, and automatically adapts sampling to the most informative arms. This approach provides a general template for extending “structure-aware” BAI strategies in linearly parameterized or more broadly structured bandit models.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free