Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fixed-Budget Best-Arm Identification

Updated 2 February 2026
  • Fixed-budget best-arm identification is defined as selecting the arm with the highest expected reward under a strict sampling constraint, balancing adaptation and statistical efficiency.
  • It employs adaptive allocation strategies, such as sequential elimination and nonlinear allocation, to achieve exponential decay of error probability despite resource limits.
  • The framework extends to Bayesian, structured, and side-observation models, offering practical insights for experimental design and treatment choice in constrained environments.

Fixed-budget best-arm identification (FB-BAI) refers to the problem of identifying, with minimal error probability, the single arm with the largest expected reward from a finite set of stochastic arms, given a fixed, finite sampling budget. Unlike fixed-confidence BAI (which seeks to achieve a target error probability with minimal samples), FB-BAI fundamentally concerns the tradeoff between statistical efficiency and adaption under a strict sample constraint. This regime is central in experimental design, treatment choice, and other applications requiring pure exploration under resource limits.

1. Formal Framework and Problem Statement

Let A={1,,K}A = \{1, \ldots, K\} denote KK arms; each arm ii yields i.i.d. rewards Xi,1,Xi,2,X_{i,1}, X_{i,2}, \ldots from an unknown law νi\nu_i, with mean μi\mu_i. Without loss, assume μ1>μ2μK\mu_1 > \mu_2 \ge \cdots \ge \mu_K. The player is allowed TT total samples, adaptively allocated: at each tt, select ItAI_t \in A and observe XIt,NIt(t)X_{I_t, N_{I_t}(t)} (Ni(t)N_{i}(t) counts pulls of ii up to tt). After all TT draws, a recommendation rule outputs JAJ \in A.

The primary performance metric is the misidentification probability: Pe(T)=P(J1),P_e(T) = \mathbb{P}( J \neq 1 ), which one aims to minimize for all bandit instances. Alternative metrics, such as expected simple regret, also arise, but the canonical objective is exponential decay of Pe(T)P_e(T) with TT.

2. Canonical Algorithms: Sequential Elimination and Allocation Schemes

Sequential elimination algorithms operate in RR stages, maintaining a set GrG_r of survivors, sequentially eliminating arms based on empirical means. In round rr, each arm in GrG_r is sampled up to nrn_r times; the brb_r worst are discarded. The budget constraint enforces

r=1Rmrtr=T,\sum_{r=1}^R m_r t_r = T,

where mr=Grm_r = |G_r|, tr=nrnr1t_r = n_r - n_{r-1}.

Nonlinear allocation rules (e.g., "Nonlinear Sequential Elimination" (Shahrampour et al., 2016)) set zr=mrpz_r = m_r^p, p>0p > 0, dedicate budget proportional to mrpm_r^{-p}, and eliminate one arm per round (br=1b_r=1). The allocation parameter pp is tuned to the number of competitive arms: p<1p<1 for many-competitor regimes, p>1p>1 for few-competitor settings. This nonlinearity can remove logK\log K factors present in linear or uniform allocation schemes.

Side-observation models further extend the framework: pulling one arm can reveal outcomes of several arms, yielding improved rates by pooling information across arm groups (Shahrampour et al., 2016).

3. Information-Theoretic Complexity and Lower Bounds

The critical problem-dependent complexity is

H=i=2K1(μ1μi)2.H = \sum_{i=2}^K \frac{1}{(\mu_1 - \mu_i)^2}.

"Tight (lower) bounds" (Carpentier et al., 2016) demonstrate that, for general KK-armed stochastic bandits,

Pe(T)exp(TclogKH)P_e(T) \ge \exp\left( - \frac{T}{c \log K \, H} \right)

for some absolute constant cc, with a matching upper bound (up to constants) achieved by Successive Rejects and its descendants.

The logK\log K penalty—absent in the fixed-confidence setting—captures the cost of adaptation when HH is unknown, and is essential except for certain narrow complexity regimes. If HH is known, algorithms can achieve the fixed-confidence-style rate exp(T/H)\exp(-T/H); otherwise, fixed-budget procedures are minimax optimal only up to this log-factor.

This adaptation price generalizes to structured bandits (e.g., linear models), with analogous instance-dependent complexity measures H2,linH_{2,\mathrm{lin}}, effective dimension dd, and corresponding fixed-budget rates of order exp(T/(H2,linlogd))\exp(-T/(H_{2,\mathrm{lin}} \log d)) (Yang et al., 2021, Azizi et al., 2021).

4. Advances in Adaptive Allocation: Optimality and Minimax Rates

Recent decades have seen several advances in minimax-optimal FB-BAI:

  • Adaptive Generalized Neyman Allocation (AGNA)/GNA: In the small-gap regime (arms nearly indistinguishable), the minimax exponent is explicitly characterized: the optimal allocation ww^* solves

    maxwΔKminab1σb2wb+σa2wa,\max_{w \in \Delta_K} \min_{a \neq b} \frac{1}{\frac{\sigma_b^2}{w_b} + \frac{\sigma_a^2}{w_a}},

with sampling fractions proportional to variances (cf. classic Neyman allocation for K=2K=2). If variances are unknown, estimation on the fly combined with appropriately weighted adaptive importance sampling (AIPW) achieves sharp optimality (Kato, 2024, Kato, 2023, Kato, 2023).

  • Neural and Batched Tracking: Universal tracking algorithms (e.g., Rgo-tracking, DOT) use neural function approximation or batching with delayed allocation to closely follow the minimax lower bound R=limBRBgoR^* = \lim_{B\to\infty} R^{go}_B (Komiyama et al., 2022). The associated policies provably achieve the best possible exponent in the fixed-budget regime.
  • Best-Feasible-Arm and Structured Settings: For linear bandits with constraints or structure, fixed-budget algorithms leverage G-optimal design, game-theoretic allocation, or two-phase approaches based on support recovery (e.g., Lasso-OD for sparse settings) to achieve minimax exponents depending on the effective dimension or sparsity only (Bian et al., 3 Jun 2025, Yavas et al., 2023, Yang et al., 2021).

5. Bayesian, Frequentist, and Regret Perspectives

Bayesian FB-BAI considers arms' means drawn from priors. Bayesian elimination, adapting successive elimination to the posterior, achieves error bounds dependent on prior sharpness; Bayes risk decays as O(1/T)O(1/\sqrt{T}) and matches the lower bound in two-arm settings (Atsidakou et al., 2022). Recent UCB-type algorithms enhance performance by learning the prior, guaranteeing optimal O(K/n)O(\sqrt{K/n}) Bayes risk (Zhu et al., 2024).

A notable negative result is that Bayes-optimal algorithms (minimizing Bayes simple regret via dynamic program recursion) can be strictly suboptimal for worst-case frequentist regret: in pathological instances, their simple regret decays only polynomially, not exponentially (Komiyama, 2022). In contrast, frequentist algorithms (successive rejects, sequential halving) guarantee uniform exponential decay for any μ\mu with unique best arm.

6. Large-Deviation Analysis, Algorithmic Refinements, and Extensions

Recent work establishes large deviation principles under both static and adaptive allocation. For static strategies, the optimal error exponent is

I(μ)=maxθΔKminiikθkD(λkμk),I^*(\mu) = \max_{\theta \in \Delta_K} \min_{i^* \neq i} \sum_k \theta_k D(\lambda_k \Vert \mu_k),

where D()D(\cdot \Vert \cdot) is the Kullback–Leibler divergence. Adaptive algorithms (e.g., SRED: "Continuous Rejects") leverage empirical gap-triggered elimination, yielding strictly better exponent guarantees than classical phase-based policies such as successive rejects (Wang et al., 2023).

In combinatorial and quantile objectives, tailored FB-BAI algorithms exploit group coding, batch feedback, or quantile-based elimination to extend minimax rates to more complex pure exploration scenarios (2502.01429, Zhang et al., 2020).

7. Summary Table: Core Algorithmic Paradigms

Algorithm Class Error Exponent Key Features
Uniform / Static Allocation exp(T/H1)\exp(-T / H_1) (oracle), suboptimal adaptively Simplicity, log-factor suboptimal in general
Successive Rejects (SR) exp(T/(H2logK))\exp(-T / (H_2 \log K)) No knowledge of gaps needed, matches lower bound
Nonlinear Elimination [1609] exp(T/(CpH(p)))\exp(-T / (C_p H(p))) Log-free in many regimes with tuned pp
GNA, NA-AIPW exp(T/V)\exp(-T / V^*) in small-gap Gaussian models Minimax optimal, variance-adaptive allocations
SRED, Adaptive LD algorithms Improved exponent over SR/SH Fully adaptive, local-to-global gap sensitivity
Bayesian Elimination O(1/T)O(1/\sqrt{T}) Bayes error [2211] Prior-aware, better in informed/low-uncertainty
Rgo-TNN, DOT [2206] Achieves oracle exponent R=RgoR^* = R^{go}_\infty Near-optimal via NN tracking or batching
Linear/GLM Bandit BAI exp(T/(H2logd))\exp(-T / (H_2\log d)) G-optimal design, match minimax for structure

References

These results collectively characterize the statistical and algorithmic landscape of fixed-budget best-arm identification, with rigorous understanding of rate-optimal strategies, adaptation penalties, structured and Bayesian extensions, and the unresolved open problems for general non-Gaussian or non-small-gap regimes.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Fixed-Budget Best-Arm Identification.