Non-stochastic Best-Arm Identification

Updated 22 December 2025

Non-stochastic best-arm identification is defined as selecting the optimal arm from fixed, converging loss sequences under a strict budget.
Successive Halving iteratively reallocates resources to promising arms, achieving up to an Ω(n/log n) improvement over uniform sampling.
This framework is applied to hyperparameter optimization, reducing computational overhead and enabling faster, more precise model tuning.

Non-stochastic best-arm identification addresses the selection of the optimal alternative from a fixed set when outcome sequences are determined by an oblivious adversary rather than by an underlying stochastic process. Unlike traditional multi-armed bandit settings that emphasize regret minimization, this framework specifically targets pure-exploration under a “fixed-budget” constraint, where the goal is to use a finite sampling budget to reliably identify the option with minimal limiting loss. Notably, non-stochastic best-arm identification facilitates the formulation of hyperparameter optimization as a resource allocation problem, aligning evaluation efforts with promising candidates and enabling substantial reductions in computational overhead compared to naive uniform approaches (Jamieson et al., 2015).

1. Formal Definition and Envelope Functions

The non-stochastic best-arm identification problem is formalized for $n$ arms, where feedback on arm $i$ is the sequence $(\ell_{i,1}, \ell_{i,2}, \dots)$ of losses, each pre-specified by an oblivious adversary. Each sequence is assumed to converge: $\nu_i = \lim_{k \rightarrow \infty} \ell_{i,k}$ . The agent sequentially allocates samples (pulls) to arms, subject to the budget constraint $\sum_{i=1}^n T_i \leq B$ , and must ultimately select an arm $\hat{\imath}$ intended to minimize $\nu_i$ . The core objective is to minimize the probability of misidentification, i.e., $\Pr(\hat{\imath} \neq \arg\min_i \nu_i)$ .

Convergence rates are encapsulated via envelope functions:

$\gamma_i(t) = \min\{\gamma \geq 0 : |\ell_{i,t} - \nu_i| \leq \gamma\}$

and their uniform worst-case

$\bar{\gamma}(t) = \max_i \gamma_i(t).$

These envelopes quantify how quickly observed losses approach their limiting values, serving as key parameters in algorithmic sample complexity guarantees. The inverse envelope, $\gamma_i^{-1}(\alpha)$ , gives the minimal time for arm $i$ to achieve $\alpha$ -accuracy.

2. Successive Halving Algorithm

The principal algorithm for non-stochastic best-arm identification is Successive Halving, which iteratively eliminates poor performers while redistributing evaluation budget to survivors. Given a budget $B$ and $n$ arms, it operates as follows:

Initialize $S_0 = \{1,2,\dots,n\}$ .
At each round $k$ ( $k=0,\dots,\lceil \log_2 n \rceil-1$ ): a. Allocate $r_k = \big\lfloor \frac{B}{|S_k| \lceil \log_2 n \rceil} \big\rfloor$ additional pulls to each surviving arm. b. Observe and rank arms by empirical loss. c. Discard the worst half, $S_{k+1}$ , retaining the better performers.

The unique survivor at the completion of $\lceil \log_2 n \rceil$ rounds is recommended as the best arm. If the total budget $B$ is unknown, a doubling-trick—running the procedure with successively doubled budgets—provides an adaptive alternative with at most a factor-2 overhead.

3. Theoretical Guarantees

Successive Halving exhibits deterministic guarantees for best-arm identification in the non-stochastic, oblivious setting. The sufficient budget for exact identification is:

$z = 2\,\lceil\log_2n\rceil\,\max_{i=2,\dots,n}\left[ i \left( 1+\bar\gamma^{-1}\left(\frac{\nu_i-\nu_1}{2}\right) \right) \right]$

If $B > z$ , Successive Halving is guaranteed to select the arm $1$ with $\nu_1 = \min_i \nu_i$ . Uniform allocation—in which each arm is allotted $B/n$ pulls—requires

$B \geq n \;\max_{i=2,\dots,n}\bar\gamma^{-1}\left(\frac{\nu_i-\nu_1}{2}\right)$

in the worst case to ensure correct identification, so Successive Halving can achieve up to an $\Omega(n/\log n)$ improvement in sample complexity.

In sub-optimal budget regimes, Successive Halving still ensures selection of an arm within

$2\,\lceil\log_2 n\rceil\, \bar\gamma\!\left(\big\lfloor B/(n\lceil\log_2n\rceil)\big\rfloor\right)$

of the optimal $\nu_1$ , while the uniform allocation outputs an arm no worse than $2\,\bar\gamma(B/n)$ from optimal.

4. Application to Hyperparameter Optimization

Non-stochastic best-arm identification directly models hyperparameter optimization in iterative machine learning settings. Each hyperparameter configuration $\theta \in \Theta$ corresponds to an arm, and the validation loss at training iteration $t$ defines $\ell_{\theta,t}$ . Since computational budgets are finite in practical scenarios, the objective is to find

$\theta^* = \arg\min_\theta \nu_\theta$

where $\nu_\theta$ is the limiting validation error for each configuration. Allocating training resources via Successive Halving—evaluating many hyperparameters shallowly, then focusing deeper on promising candidates—greatly enhances efficiency relative to uniform or random search, especially when learning curves vary across configurations (Jamieson et al., 2015).

5. Empirical Comparisons

Empirical evaluation compared uniform allocation, Successive Rejects, Successive Halving, and other fixed-confidence/regret-minimization strategies (LUCB, lil’UCB, EXP3) on:

Ridge regression (SGD, $n=10$ arms, Million Song dataset), metric: MSE.
RBF-kernel SVM (Pegasos, $n=100$ , 0/1 loss).
Matrix factorization (bi-convex SGD, $n=64$ , MovieLens-100k, MSE).

Key findings include:

Fixed-confidence strategies (LUCB, lil’UCB, EXP3) achieve low error in fewer iterations, but their wall-clock time is dominated by the overhead of hold-out loss evaluations.
Successive Halving outperforms both Successive Rejects and uniform allocation in wall-clock time; for SVM tuning, it reaches the best test accuracy over an order of magnitude faster.
For matrix factorization, both Successive Halving and Successive Rejects outperform uniform allocation, with Successive Halving typically selecting competitive arms with fewer total iterations.

A summary of methods and outcomes:

Method	Tasks and Datasets	Empirical Outcome
Uniform allocation	All tasks	Baseline (slower, sample-inefficient)
Successive Rejects	All tasks	Outpaced by Successive Halving, especially in wall-clock time
Successive Halving	All tasks	Dominates other fixed-budget methods in time; order-of-magnitude faster in some cases
Fixed-confidence	All tasks	Lower iteration count but impractical wall-clock time due to evaluation overhead

6. Practical Considerations and Extensions

Budget Selection: A fixed total iteration-budget $B$ can be chosen or adaptively discovered using the doubling-trick, incurring at most a factor-2 overhead.
Confidence Guarantees: In the oblivious non-stochastic context, only deterministic accuracy guarantees (conditional on $\gamma_i$ ) can be given; stochastic settings may leverage repeated subsampling with Successive Halving as a fixed-confidence wrapper.
Adaptive Arm Generation: Initial hyperparameter sampling (“arm-generation”) is orthogonal to the allocation of computation. Integration with Bayesian optimization for arm proposal and mini-batch Successive Halving for scheduling is a promising direction.
Envelopes and Convergence Rates: Successive Halving adapts to unknown loss curve envelopes $\gamma_i(t)$ . If arm-specific convergence rates are known (e.g., strong convexity), allocations may be further optimized.
Resource Reallocation Costs: In large-model settings, hardware switching (parameter servers, GPUs) can be nontrivial, suggesting the need to analyze “warm-start” transitions or amortized costs.

7. Conclusion

Non-stochastic best-arm identification provides a rigorous, fixed-budget framework for allocating fixed computational resources among alternatives with arbitrary, converging loss sequences. Successive Halving offers a computationally efficient, theoretically analyzable strategy that is demonstrably superior to uniform or naive approaches in both sample and wall-clock complexity. Its utility in large-scale hyperparameter optimization is established by empirical speedups of more than an order of magnitude, marking it as a foundational approach for pure-exploration tasks when outcome sequences are non-random but predefined (Jamieson et al., 2015).

PDF Markdown Chat (Pro)

References (1)

Non-stochastic Best Arm Identification and Hyperparameter Optimization (2015)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Non-stochastic Best-Arm Identification.