Papers
Topics
Authors
Recent
2000 character limit reached

Non-stochastic Best-Arm Identification

Updated 22 December 2025
  • Non-stochastic best-arm identification is defined as selecting the optimal arm from fixed, converging loss sequences under a strict budget.
  • Successive Halving iteratively reallocates resources to promising arms, achieving up to an Ω(n/log n) improvement over uniform sampling.
  • This framework is applied to hyperparameter optimization, reducing computational overhead and enabling faster, more precise model tuning.

Non-stochastic best-arm identification addresses the selection of the optimal alternative from a fixed set when outcome sequences are determined by an oblivious adversary rather than by an underlying stochastic process. Unlike traditional multi-armed bandit settings that emphasize regret minimization, this framework specifically targets pure-exploration under a “fixed-budget” constraint, where the goal is to use a finite sampling budget to reliably identify the option with minimal limiting loss. Notably, non-stochastic best-arm identification facilitates the formulation of hyperparameter optimization as a resource allocation problem, aligning evaluation efforts with promising candidates and enabling substantial reductions in computational overhead compared to naive uniform approaches (Jamieson et al., 2015).

1. Formal Definition and Envelope Functions

The non-stochastic best-arm identification problem is formalized for nn arms, where feedback on arm ii is the sequence (i,1,i,2,)(\ell_{i,1}, \ell_{i,2}, \dots) of losses, each pre-specified by an oblivious adversary. Each sequence is assumed to converge: νi=limki,k\nu_i = \lim_{k \rightarrow \infty} \ell_{i,k}. The agent sequentially allocates samples (pulls) to arms, subject to the budget constraint i=1nTiB\sum_{i=1}^n T_i \leq B, and must ultimately select an arm ı^\hat{\imath} intended to minimize νi\nu_i. The core objective is to minimize the probability of misidentification, i.e., Pr(ı^argminiνi)\Pr(\hat{\imath} \neq \arg\min_i \nu_i).

Convergence rates are encapsulated via envelope functions:

γi(t)=min{γ0:i,tνiγ}\gamma_i(t) = \min\{\gamma \geq 0 : |\ell_{i,t} - \nu_i| \leq \gamma\}

and their uniform worst-case

γˉ(t)=maxiγi(t).\bar{\gamma}(t) = \max_i \gamma_i(t).

These envelopes quantify how quickly observed losses approach their limiting values, serving as key parameters in algorithmic sample complexity guarantees. The inverse envelope, γi1(α)\gamma_i^{-1}(\alpha), gives the minimal time for arm ii to achieve α\alpha-accuracy.

2. Successive Halving Algorithm

The principal algorithm for non-stochastic best-arm identification is Successive Halving, which iteratively eliminates poor performers while redistributing evaluation budget to survivors. Given a budget BB and nn arms, it operates as follows:

  1. Initialize S0={1,2,,n}S_0 = \{1,2,\dots,n\}.
  2. At each round kk (k=0,,log2n1k=0,\dots,\lceil \log_2 n \rceil-1): a. Allocate rk=BSklog2nr_k = \big\lfloor \frac{B}{|S_k| \lceil \log_2 n \rceil} \big\rfloor additional pulls to each surviving arm. b. Observe and rank arms by empirical loss. c. Discard the worst half, Sk+1S_{k+1}, retaining the better performers.

The unique survivor at the completion of log2n\lceil \log_2 n \rceil rounds is recommended as the best arm. If the total budget BB is unknown, a doubling-trick—running the procedure with successively doubled budgets—provides an adaptive alternative with at most a factor-2 overhead.

3. Theoretical Guarantees

Successive Halving exhibits deterministic guarantees for best-arm identification in the non-stochastic, oblivious setting. The sufficient budget for exact identification is:

z=2log2nmaxi=2,,n[i(1+γˉ1(νiν12))]z = 2\,\lceil\log_2n\rceil\,\max_{i=2,\dots,n}\left[ i \left( 1+\bar\gamma^{-1}\left(\frac{\nu_i-\nu_1}{2}\right) \right) \right]

If B>zB > z, Successive Halving is guaranteed to select the arm $1$ with ν1=miniνi\nu_1 = \min_i \nu_i. Uniform allocation—in which each arm is allotted B/nB/n pulls—requires

Bn  maxi=2,,nγˉ1(νiν12)B \geq n \;\max_{i=2,\dots,n}\bar\gamma^{-1}\left(\frac{\nu_i-\nu_1}{2}\right)

in the worst case to ensure correct identification, so Successive Halving can achieve up to an Ω(n/logn)\Omega(n/\log n) improvement in sample complexity.

In sub-optimal budget regimes, Successive Halving still ensures selection of an arm within

2log2nγˉ ⁣(B/(nlog2n))2\,\lceil\log_2 n\rceil\, \bar\gamma\!\left(\big\lfloor B/(n\lceil\log_2n\rceil)\big\rfloor\right)

of the optimal ν1\nu_1, while the uniform allocation outputs an arm no worse than 2γˉ(B/n)2\,\bar\gamma(B/n) from optimal.

4. Application to Hyperparameter Optimization

Non-stochastic best-arm identification directly models hyperparameter optimization in iterative machine learning settings. Each hyperparameter configuration θΘ\theta \in \Theta corresponds to an arm, and the validation loss at training iteration tt defines θ,t\ell_{\theta,t}. Since computational budgets are finite in practical scenarios, the objective is to find

θ=argminθνθ\theta^* = \arg\min_\theta \nu_\theta

where νθ\nu_\theta is the limiting validation error for each configuration. Allocating training resources via Successive Halving—evaluating many hyperparameters shallowly, then focusing deeper on promising candidates—greatly enhances efficiency relative to uniform or random search, especially when learning curves vary across configurations (Jamieson et al., 2015).

5. Empirical Comparisons

Empirical evaluation compared uniform allocation, Successive Rejects, Successive Halving, and other fixed-confidence/regret-minimization strategies (LUCB, lil’UCB, EXP3) on:

  • Ridge regression (SGD, n=10n=10 arms, Million Song dataset), metric: MSE.
  • RBF-kernel SVM (Pegasos, n=100n=100, 0/1 loss).
  • Matrix factorization (bi-convex SGD, n=64n=64, MovieLens-100k, MSE).

Key findings include:

  • Fixed-confidence strategies (LUCB, lil’UCB, EXP3) achieve low error in fewer iterations, but their wall-clock time is dominated by the overhead of hold-out loss evaluations.
  • Successive Halving outperforms both Successive Rejects and uniform allocation in wall-clock time; for SVM tuning, it reaches the best test accuracy over an order of magnitude faster.
  • For matrix factorization, both Successive Halving and Successive Rejects outperform uniform allocation, with Successive Halving typically selecting competitive arms with fewer total iterations.

A summary of methods and outcomes:

Method Tasks and Datasets Empirical Outcome
Uniform allocation All tasks Baseline (slower, sample-inefficient)
Successive Rejects All tasks Outpaced by Successive Halving, especially in wall-clock time
Successive Halving All tasks Dominates other fixed-budget methods in time; order-of-magnitude faster in some cases
Fixed-confidence All tasks Lower iteration count but impractical wall-clock time due to evaluation overhead

6. Practical Considerations and Extensions

  • Budget Selection: A fixed total iteration-budget BB can be chosen or adaptively discovered using the doubling-trick, incurring at most a factor-2 overhead.
  • Confidence Guarantees: In the oblivious non-stochastic context, only deterministic accuracy guarantees (conditional on γi\gamma_i) can be given; stochastic settings may leverage repeated subsampling with Successive Halving as a fixed-confidence wrapper.
  • Adaptive Arm Generation: Initial hyperparameter sampling (“arm-generation”) is orthogonal to the allocation of computation. Integration with Bayesian optimization for arm proposal and mini-batch Successive Halving for scheduling is a promising direction.
  • Envelopes and Convergence Rates: Successive Halving adapts to unknown loss curve envelopes γi(t)\gamma_i(t). If arm-specific convergence rates are known (e.g., strong convexity), allocations may be further optimized.
  • Resource Reallocation Costs: In large-model settings, hardware switching (parameter servers, GPUs) can be nontrivial, suggesting the need to analyze “warm-start” transitions or amortized costs.

7. Conclusion

Non-stochastic best-arm identification provides a rigorous, fixed-budget framework for allocating fixed computational resources among alternatives with arbitrary, converging loss sequences. Successive Halving offers a computationally efficient, theoretically analyzable strategy that is demonstrably superior to uniform or naive approaches in both sample and wall-clock complexity. Its utility in large-scale hyperparameter optimization is established by empirical speedups of more than an order of magnitude, marking it as a foundational approach for pure-exploration tasks when outcome sequences are non-random but predefined (Jamieson et al., 2015).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Non-stochastic Best-Arm Identification.