Fixed-Budget Setting in Sequential Decision-Making

Updated 5 February 2026

Fixed-Budget (FB) setting is a sequential decision-making framework where a fixed number of evaluations (budget T) is allotted to minimize error probability.
It contrasts with fixed-confidence approaches by pre-setting resource limits, leading to unique minimax lower and upper bounds that guide algorithm design.
FB methods drive diverse applications including multi-armed bandits and algorithm portfolio selection, employing elimination strategies and design-based allocations for robust performance.

The fixed-budget (FB) setting is a foundational regime in sequential decision-making, optimization, and algorithm selection, where an agent or algorithm must complete its objective within a strictly limited resource budget. Typically, the FB paradigm is contrasted with fixed-confidence (FC) approaches: in FB, the budget is set in advance, and the goal is to maximize accuracy (e.g., minimize misidentification error), whereas in FC, accuracy is specified, and the goal is to minimize the required resource consumption. The FB setting pervades multiple domains including pure-exploration in multi-armed bandits, bandit best-arm and Pareto identification, algorithm portfolio selection, bandit change-point detection, voting mechanisms, and fixed-budget analysis of stochastic search or optimization heuristics. Recent research establishes tight minimax lower and upper bounds, highlights the importance of complexity measures determining error exponents, reveals subtle differences from FC, and develops optimal or near-optimal algorithms in diverse structured and unstructured environments.

1. Formal Definition and Core Principles

Formally, in an FB problem the learner (or decision-maker) is allocated a total resource budget $T$ (e.g., number of arm pulls, function evaluations, or action queries), which must not be exceeded. Throughout $t=1,2,\dots,T$ , the learner adaptively selects actions (arms, designs, algorithms), receives possibly noisy or delayed feedback, and at $T$ must output a recommendation (best arm, Pareto set, optimizer, change-point estimate, voting decision, etc.).

The primary performance metric is the error probability (e.g., probability of incorrect best-arm selection), simple (set) regret, or expected utility after exactly $T$ resource uses. The learner's strategy (possibly non-adaptive or adaptive) should minimize this error, given only the budget constraint: $\text{minimize}\quad \mathbf{P}(\text{error}) \quad \text{subject to}\quad \text{total resource used} \leq T.$ The theoretical analysis generally focuses on the decay rate of the error as $T\to\infty$ , the explicit characterization of error exponents in terms of problem-dependent complexity quantities, and the gap between upper and lower bounds achievable by different algorithms (Yang et al., 2021, Carpentier et al., 2016, Balagopalan et al., 3 Feb 2026, Qin, 2023).

2. Minimax Lower and Upper Bounds; Complexity Measures

Research in FB settings rigorously establishes minimax lower bounds—provable limitations—on the achievable performance, as well as matching (up to logarithmic factors) upper bounds for specific algorithms.

Single-objective best-arm identification (BAI): Let $\Delta_k=\mu^*-\mu_k$ be the gap between the top mean and arm $k$ . The problem complexity is $H=\sum_{k\notin A^*}\frac{1}{\Delta_k^2}$ , and information-theoretic results show that for any (possibly adaptive) algorithm, there exists an instance such that

$\mathbb{P}(\widehat{k}_T\neq k^*) \geq \exp\left(-\frac{T}{\log(K)H}\right).$

The logarithmic factor is necessary unless instance complexity is known in advance. The Successive Rejects algorithm matches this up to constants and $\log K$ (Carpentier et al., 2016).

Structured linear bandits: The complexity is $H_{\rm 2,lin} = \max_{2\leq i\leq d}\frac{i}{\Delta_i^2}$ (for effective dimension $d$ ). The OD-LinBAI algorithm achieves

$\mathbb{P}(\text{error}) \leq \exp\bigl(-\Omega[T/(H_{\rm 2,lin}\log d)]\bigr),$

which is minimax optimal in exponent (Yang et al., 2021).

Heterogeneous-variance bandits: For known variances $\sigma^2_i$ , SHVar achieves

$\mathbb{P}\{\hat i \neq i_*\} \le 2\log_2K \exp\left[-\frac{B\Delta_{\min}^2}{4\log_2K\,\sum \sigma_i^2}\right],$

thus revealing explicit control in terms of variance-weighted complexity (Lalitha et al., 2023).

Multi-objective/Pareto set identification: The instance complexity is measured by $H_2(\nu)=\max_{i\leq K} i\Delta_{(i)}^{-2}$ , and gap-dependent exponents are achieved by elimination-based algorithms (Kone et al., 2023, Nonaga et al., 27 Jun 2025).

These results apply broadly to any pure-exploration fixed-budget problem, including change-point detection (Lazzaro et al., 22 Jan 2025), privacy-constrained BAI (Chen et al., 2024), as well as generalized or Bayesian settings (Nguyen et al., 2024, Atsidakou et al., 2022, Zhu et al., 2024). Tables summarizing key exponents and corresponding algorithms appear in the cited works.

3. Algorithms and Design Methodologies

Prominent algorithmic paradigms in FB settings include:

Elimination Strategies: Phased elimination such as Successive Halving (SH), Successive Rejects (SR), and Generalized Successive Elimination (GSE) for structured models. These algorithms divide the budget between arms (or arm sets) in rounds, eliminate suboptimal candidates, and concentrate sampling on promising arms (Carpentier et al., 2016, Yang et al., 2021, Azizi et al., 2021, Kone et al., 2023).
Optimal Design-based Allocation: G-optimal designs minimize the maximum estimation variance over the arm set, crucial in linear/structured bandits. OD-LinBAI and similar methods explicitly compute near-optimal static (or updates of) allocations (Yang et al., 2021, Yavas et al., 2023).
Variance-adaptive Methods: For heteroscedastic (unequal variance) rewards, adaptive allocations proportional to variance achieve optimality, as in SHVar and SHAdaVar (Lalitha et al., 2023). In high-dimensional sparse bandits, two-stage procedures (e.g., Lasso-OD) first recover the support, then identify the best arm in the reduced space, thus decoupling statistical and optimization complexity (Yavas et al., 2023).
Bayesian and Prior-dependent Approaches: Bayesian elimination and prior-dependent AdaBAI improve sample allocations by leveraging prior knowledge and prior-induced complexity, achieving tight $O(1/\sqrt{T})$ error (Atsidakou et al., 2022, Nguyen et al., 2024, Zhu et al., 2024).
Meta-algorithmic Reductions: FC2FB converts any FC algorithm with logarithmic sample complexity into an FB algorithm with matching error exponent (up to a $\log(B)$ factor), establishing that FB is no harder than FC in terms of sample complexity (Balagopalan et al., 3 Feb 2026).

4. Fixed-Budget versus Fixed-Confidence Regimes

The relationship between FB and FC settings is central to understanding pure-exploration tradeoffs:

In the FC regime, the goal is to identify the best arm (or other target) with error at most $\delta$ , minimizing expected samples. The sample complexity is $T^*_{FC}(\delta)\asymp H\log(1/\delta)$ .
In contrast, the FB regime asks: with $T$ samples, what is the best achievable error probability? Minimax results show that the optimal error decays as $e^{-T/(H\log K)}$ , reflecting an unavoidable adaptation price unless instance complexity is known (Carpentier et al., 2016, Qin, 2023).
The reduction FC2FB demonstrates that, up to logarithmic terms, the instance-dependent difficulty (i.e., problem hardness $H$ ) is not worse in FB than in FC; thus, theoretically, FB is no harder than FC up to log factors (Balagopalan et al., 3 Feb 2026).
Notably, in certain settings (e.g., two-arm Gaussian bandits), the optimal exponents for both regimes coincide, but for general cases, subtle adaptivity gaps may remain, and characterizing adaptive-complexity exponents $\Gamma^*_{fb}(\mu)$ remains open (Qin, 2023).

5. Extensions: Structured Problems, Multi-objective, Privacy, Algorithm Selection

The FB regime encompasses a wide variety of problem generalizations:

Structured bandits: Linear, generalized linear, sparse, or hierarchical models allow sophisticated design-based allocation, exploiting low-dimensional structure or sparsity for exponential improvement in error exponents (Yang et al., 2021, Azizi et al., 2021, Yavas et al., 2023, Bian et al., 3 Jun 2025).
Multi-objective/Pareto set identification: Algorithms such as Empirical Gap Elimination (EGE) achieve exponential decay in misidentification for identifying Pareto sets under FB, with error rates determined by multi-dimensional “gap” complexity measures (Kone et al., 2023, Nonaga et al., 27 Jun 2025).
Change-point bandits: Identification of a discontinuity in a reward function with a fixed budget exhibits sharp phase transitions in regime complexity (small vs. large budget); algorithms such as SH and SHA achieve matching upper and lower bounds (Lazzaro et al., 22 Jan 2025).
Differential privacy: Mechanisms such as DP-BAI using maximum absolute determinant designs and Laplace noise injection precisely capture the privacy-induced penalty in the error exponent, giving additive (privacy + non-private) complexity (Chen et al., 2024).
Voting and social choice: In fixed-budget multiple-issue Quadratic Voting, each agent is given a hard cap of credits to allocate with quadratic pricing, leading to utilitarian-efficient equilibria and tractable NE verification, contrasting with classical voting rules (Georgescu et al., 2024).
Algorithm selection/portfolio construction: In expensive black-box optimization, FB forces explicit accounting for resource used in feature computation, necessitating budget-aware portfolio design and algorithm selection methods (feature computation budget $s$ subtracted from total $B$ before algorithm assignment), which empirically outperform naive approaches (Jankovic et al., 2020, Yoshikawa et al., 2024).

6. Methodological and Analytical Tools

Fixed-budget theory relies on a rich toolkit for quantitative analysis:

Large deviations theory: Rate function arguments (Chernoff bounds, uniform Laplace principle) underpin lower bounds and reveal dominant error sources (Wu et al., 2018, Carpentier et al., 2016).
Information-theoretic change-of-measure: KL-divergence-based arguments yield minimax lower bounds; in privacy settings, total variation and DP-specific change-of-measure augment classical tools (Chen et al., 2024).
Drift analysis: For randomized search heuristics, drift inequalities yield explicit bounds on expected progress or remaining suboptimality after $T$ steps, controlling both expectation and high-probability deviations (Kötzing et al., 2020).
Posterior sampling and adversarial allocation: Minimax-optimal identification in constrained linear bandits (e.g., BLFAIPS) leverages adversarial posterior draws and hedged allocations to attain the rate predicted by the lower bound (Bian et al., 3 Jun 2025).
Portfolio construction heuristics: In algorithm selection under FB, landscape features, and evaluation budgets, regression-based selectors benefit from cost-aware portfolio construction, and regression blending (log scale/unscaled) further boosts predictive accuracy (Jankovic et al., 2020, Yoshikawa et al., 2024).

7. Applications, Practical Guidance, and Open Problems

Empirical studies and practical advice include:

OD-LinBAI, GSE, and EGE variants achieve uniformly lower error rates than prior methods on both synthetic and real bandit datasets, especially as problem size or dimension increases (Yang et al., 2021, Azizi et al., 2021, Kone et al., 2023).
Variance-adaptive allocations are crucial in heteroscedastic domains, and feature-based algorithm selectors must strictly account for sampling cost to maximize overall utility in expensive optimization (Lalitha et al., 2023, Yoshikawa et al., 2024).
In the context of risk-averse and multi-objective settings, exponential error decay in FB is possible using carefully designed elimination-type algorithms or adaptive CIs, outperforming hypervolume and classical evolutionary multi-objective baselines (Nonaga et al., 27 Jun 2025, Kone et al., 2023).
The duality and adaptation gap between FB and FC are now essentially closed (up to logarithmic factors), but the full instance-dependent optimal error exponent for general FB identification problems remains an open question (Qin, 2023).
In application-driven domains (e.g., multi-agent resource allocation, fixed-budget voting, private data analysis), explicit error–budget tradeoffs at the design phase are critical for principled algorithm deployment (Georgescu et al., 2024, Chen et al., 2024).

In conclusion, the fixed-budget setting provides the rigorous and practical framework for a diverse array of sequential decision and optimization problems. Its distinguishing feature—the hard budget constraint—induces distinctive statistical and algorithmic effects, demanding novel allocation and inference methods and giving rise to deep theoretical developments governing the best-possible error rates as a function of the problem complexity, dimension, model structure, and privacy or resource constraints. The recent literature both closes longstanding theoretical gaps and delivers practically minimax-optimal algorithms across a wide landscape of applications.