Fixed-Budget Best-Arm Identification

Updated 2 February 2026

Fixed-budget best-arm identification is defined as selecting the arm with the highest expected reward under a strict sampling constraint, balancing adaptation and statistical efficiency.
It employs adaptive allocation strategies, such as sequential elimination and nonlinear allocation, to achieve exponential decay of error probability despite resource limits.
The framework extends to Bayesian, structured, and side-observation models, offering practical insights for experimental design and treatment choice in constrained environments.

Fixed-budget best-arm identification (FB-BAI) refers to the problem of identifying, with minimal error probability, the single arm with the largest expected reward from a finite set of stochastic arms, given a fixed, finite sampling budget. Unlike fixed-confidence BAI (which seeks to achieve a target error probability with minimal samples), FB-BAI fundamentally concerns the tradeoff between statistical efficiency and adaption under a strict sample constraint. This regime is central in experimental design, treatment choice, and other applications requiring pure exploration under resource limits.

1. Formal Framework and Problem Statement

Let $A = \{1, \ldots, K\}$ denote $K$ arms; each arm $i$ yields i.i.d. rewards $X_{i,1}, X_{i,2}, \ldots$ from an unknown law $\nu_i$ , with mean $\mu_i$ . Without loss, assume $\mu_1 > \mu_2 \ge \cdots \ge \mu_K$ . The player is allowed $T$ total samples, adaptively allocated: at each $t$ , select $I_t \in A$ and observe $X_{I_t, N_{I_t}(t)}$ ( $N_{i}(t)$ counts pulls of $i$ up to $t$ ). After all $T$ draws, a recommendation rule outputs $J \in A$ .

The primary performance metric is the misidentification probability: $P_e(T) = \mathbb{P}( J \neq 1 ),$ which one aims to minimize for all bandit instances. Alternative metrics, such as expected simple regret, also arise, but the canonical objective is exponential decay of $P_e(T)$ with $T$ .

2. Canonical Algorithms: Sequential Elimination and Allocation Schemes

Sequential elimination algorithms operate in $R$ stages, maintaining a set $G_r$ of survivors, sequentially eliminating arms based on empirical means. In round $r$ , each arm in $G_r$ is sampled up to $n_r$ times; the $b_r$ worst are discarded. The budget constraint enforces

$\sum_{r=1}^R m_r t_r = T,$

where $m_r = |G_r|$ , $t_r = n_r - n_{r-1}$ .

Nonlinear allocation rules (e.g., "Nonlinear Sequential Elimination" (Shahrampour et al., 2016)) set $z_r = m_r^p$ , $p > 0$ , dedicate budget proportional to $m_r^{-p}$ , and eliminate one arm per round ( $b_r=1$ ). The allocation parameter $p$ is tuned to the number of competitive arms: $p<1$ for many-competitor regimes, $p>1$ for few-competitor settings. This nonlinearity can remove $\log K$ factors present in linear or uniform allocation schemes.

Side-observation models further extend the framework: pulling one arm can reveal outcomes of several arms, yielding improved rates by pooling information across arm groups (Shahrampour et al., 2016).

3. Information-Theoretic Complexity and Lower Bounds

The critical problem-dependent complexity is

$H = \sum_{i=2}^K \frac{1}{(\mu_1 - \mu_i)^2}.$

"Tight (lower) bounds" (Carpentier et al., 2016) demonstrate that, for general $K$ -armed stochastic bandits,

$P_e(T) \ge \exp\left( - \frac{T}{c \log K \, H} \right)$

for some absolute constant $c$ , with a matching upper bound (up to constants) achieved by Successive Rejects and its descendants.

The $\log K$ penalty—absent in the fixed-confidence setting—captures the cost of adaptation when $H$ is unknown, and is essential except for certain narrow complexity regimes. If $H$ is known, algorithms can achieve the fixed-confidence-style rate $\exp(-T/H)$ ; otherwise, fixed-budget procedures are minimax optimal only up to this log-factor.

This adaptation price generalizes to structured bandits (e.g., linear models), with analogous instance-dependent complexity measures $H_{2,\mathrm{lin}}$ , effective dimension $d$ , and corresponding fixed-budget rates of order $\exp(-T/(H_{2,\mathrm{lin}} \log d))$ (Yang et al., 2021, Azizi et al., 2021).

4. Advances in Adaptive Allocation: Optimality and Minimax Rates

Recent decades have seen several advances in minimax-optimal FB-BAI:

Adaptive Generalized Neyman Allocation (AGNA)/GNA: In the small-gap regime (arms nearly indistinguishable), the minimax exponent is explicitly characterized: the optimal allocation $w^*$ solves

$\max_{w \in \Delta_K} \min_{a \neq b} \frac{1}{\frac{\sigma_b^2}{w_b} + \frac{\sigma_a^2}{w_a}},$

with sampling fractions proportional to variances (cf. classic Neyman allocation for $K=2$ ). If variances are unknown, estimation on the fly combined with appropriately weighted adaptive importance sampling (AIPW) achieves sharp optimality (Kato, 2024, Kato, 2023, Kato, 2023).

Neural and Batched Tracking: Universal tracking algorithms (e.g., R^go-tracking, DOT) use neural function approximation or batching with delayed allocation to closely follow the minimax lower bound $R^* = \lim_{B\to\infty} R^{go}_B$ (Komiyama et al., 2022). The associated policies provably achieve the best possible exponent in the fixed-budget regime.
Best-Feasible-Arm and Structured Settings: For linear bandits with constraints or structure, fixed-budget algorithms leverage G-optimal design, game-theoretic allocation, or two-phase approaches based on support recovery (e.g., Lasso-OD for sparse settings) to achieve minimax exponents depending on the effective dimension or sparsity only (Bian et al., 3 Jun 2025, Yavas et al., 2023, Yang et al., 2021).

5. Bayesian, Frequentist, and Regret Perspectives

Bayesian FB-BAI considers arms' means drawn from priors. Bayesian elimination, adapting successive elimination to the posterior, achieves error bounds dependent on prior sharpness; Bayes risk decays as $O(1/\sqrt{T})$ and matches the lower bound in two-arm settings (Atsidakou et al., 2022). Recent UCB-type algorithms enhance performance by learning the prior, guaranteeing optimal $O(\sqrt{K/n})$ Bayes risk (Zhu et al., 2024).

A notable negative result is that Bayes-optimal algorithms (minimizing Bayes simple regret via dynamic program recursion) can be strictly suboptimal for worst-case frequentist regret: in pathological instances, their simple regret decays only polynomially, not exponentially (Komiyama, 2022). In contrast, frequentist algorithms (successive rejects, sequential halving) guarantee uniform exponential decay for any $\mu$ with unique best arm.

Recent work establishes large deviation principles under both static and adaptive allocation. For static strategies, the optimal error exponent is

$I^*(\mu) = \max_{\theta \in \Delta_K} \min_{i^* \neq i} \sum_k \theta_k D(\lambda_k \Vert \mu_k),$

where $D(\cdot \Vert \cdot)$ is the Kullback–Leibler divergence. Adaptive algorithms (e.g., SRED: "Continuous Rejects") leverage empirical gap-triggered elimination, yielding strictly better exponent guarantees than classical phase-based policies such as successive rejects (Wang et al., 2023).

In combinatorial and quantile objectives, tailored FB-BAI algorithms exploit group coding, batch feedback, or quantile-based elimination to extend minimax rates to more complex pure exploration scenarios (2502.01429, Zhang et al., 2020).

7. Summary Table: Core Algorithmic Paradigms

Algorithm Class	Error Exponent	Key Features
Uniform / Static Allocation	$\exp(-T / H_1)$ (oracle), suboptimal adaptively	Simplicity, log-factor suboptimal in general
Successive Rejects (SR)	$\exp(-T / (H_2 \log K))$	No knowledge of gaps needed, matches lower bound
Nonlinear Elimination [1609]	$\exp(-T / (C_p H(p)))$	Log-free in many regimes with tuned $p$
GNA, NA-AIPW	$\exp(-T / V^*)$ in small-gap Gaussian models	Minimax optimal, variance-adaptive allocations
SRED, Adaptive LD algorithms	Improved exponent over SR/SH	Fully adaptive, local-to-global gap sensitivity
Bayesian Elimination	$O(1/\sqrt{T})$ Bayes error [2211]	Prior-aware, better in informed/low-uncertainty
R^go-TNN, DOT [2206]	Achieves oracle exponent $R^* = R^{go}_\infty$	Near-optimal via NN tracking or batching
Linear/GLM Bandit BAI	$\exp(-T / (H_2\log d))$	G-optimal design, match minimax for structure

References

(Shahrampour et al., 2016): General sequential elimination and nonlinear allocation, performance bounds, and side-observation extension.
(Carpentier et al., 2016): Tight lower bounds for fixed-budget best-arm with adaptation penalty.
(Atsidakou et al., 2022): Bayesian elimination, finite-budget Bayes error, prior-dependence.
(Komiyama et al., 2022): Minimax optimal rates, neural/batched tracking, and oracle characterization.
(Kato, 2024, Kato, 2023): Generalized Neyman allocation, exact local minimax optimality.
(Wang et al., 2023): Large deviation principles for FB-BAI, adaptive allocation improvements.
(Yang et al., 2021, Azizi et al., 2021, Yavas et al., 2023): Minimax exponents for (sparse) linear/structured bandits.
(Komiyama, 2022): Bayes-optimal frequentist suboptimality, dynamic-program impossibility.
(Atsidakou et al., 2022, Zhu et al., 2024): Bayesian elimination/UCB, Bayes risk rates.
(2502.01429): Combinatorial exploration with group-averaged feedback.
(Zhang et al., 2020): Quantile-based fixed-budget BAI.

These results collectively characterize the statistical and algorithmic landscape of fixed-budget best-arm identification, with rigorous understanding of rate-optimal strategies, adaptation penalties, structured and Bayesian extensions, and the unresolved open problems for general non-Gaussian or non-small-gap regimes.