Conditional Regret Bounds in Learning

Updated 20 December 2025

Conditional regret bounds are advanced measures that quantify the excess risk of prediction and decision-making algorithms by conditioning on auxiliary variables like data batches and internal randomness.
They connect performance analysis with information measures—using conditional mutual information and Sibson’s measures—to yield sharper, instance-adaptive guarantees.
Applications of conditional regret bounds span universal prediction, online learning, reinforcement learning, and risk-sensitive optimization, enabling refined and data-adaptive analyses.

Conditional regret bounds are a sophisticated tool for characterizing the excess risk or suboptimality of prediction, decision-making, or learning algorithms, subject to conditioning on auxiliary random variables, data batch histories, or aspects of the problem structure. These bounds quantify algorithmic performance not only in the classic minimax or expectation sense but often with respect to an explicit or implicit conditioning variable—history, batch, internal randomness, or auxiliary filtration. The conditional viewpoint enables sharper and more data-adaptive assessments of regret, integrates problem-dependent statistical complexity, and connects regret minimization to conditional mutual information, Sibson’s information measures, and law-of-the-iterated-logarithm arguments across universal prediction, bandits, and reinforcement learning.

1. Formal Definitions and Setup

The central quantity of interest is the conditional regret, generally defined by

$R(\hat p, \theta) = D(Y \| \hat Y \mid X)$

where $\hat p$ is a predictor, $\theta$ parameterizes a statistical model, $Y$ the target variable, $X$ a conditioning random variable (e.g., training batches or prior observations), and $D(\cdot \| \cdot \mid X)$ is a conditional divergence—typically conditional Kullback-Leibler or Rényi divergence. In batch universal prediction, the regret against $p_\theta$ is measured over the test batch $Y$ given training corpus $X^n$ , resulting in

$R(\hat p, \theta) = \sum_{x^n} p_\theta(x^n) \sum_y p_\theta(y) \log \frac{p_\theta(y)}{\hat p(y \mid x^n)}$

which coincides with the conditional KL $D(Y \| \hat Y \mid X^n)$ (Bondaschi et al., 14 Aug 2025).

Further, conditional regret bounds may also arise as conditional expected regret in Bayesian optimization: $E_\mathcal{F}[R_T \mid \mathcal{A}]= E_{f,\{\epsilon_t\}} \left[ \sum_{t=1}^T (f(x^*)-f(x_t)) \mid \{\zeta_t\}_{t \geq 1} \right ]$ where the conditioning is on the algorithm’s internal randomization $\mathcal{A}$ (Takeno et al., 2 Sep 2024).

In online betting, conditional regret refers to path-wise regret under a Ville event (high-confidence or almost-sure set of sequences), quantifying the regret for each realization with respect to the best fixed strategy in hindsight, with

$R_t = L_t^* - \ln Z_t$

where $L_t^*$ is the best log-wealth attainable, and $Z_t$ is the mixture martingale (Agrawal et al., 13 Dec 2025).

2. Conditional Regret Bounds in Universal Prediction

The Conditional Regret-Capacity Theorem for batch universal prediction provides a sharp identification of minimax conditional regret with a conditional mutual information: $\min_{\hat p} \max_\theta R(\hat p, \theta) = \sup_w I_w(\theta; Y \mid X^n)$ where $I_w(\theta; Y \mid X^n)$ is conditional mutual information between model parameter and test data, given the observed batch, optimized over all priors $w$ on $\theta$ (Bondaschi et al., 14 Aug 2025). The optimal predictor is the conditional mixture with prior $w^*$ .

For Rényi-type regret, the theorem generalizes: $\min_{\hat p}\max_\theta R_\alpha(\hat p, \theta) = \sup_w I_\alpha^w(\theta; Y \mid X^n)$ where $R_\alpha$ is the conditional Rényi divergence and $I_\alpha$ is conditional Sibson mutual information of order $\alpha$ , with the minimizer given by the conditional $\alpha$ -NML predictor (Bondaschi et al., 14 Aug 2025). This establishes a deep connection between regret minimization and conditional information measures.

Batch regret bounds in binary memoryless sources yield tight asymptotics: $\min_{\hat p} \max_{\theta \in [0,1]} R(\hat p, \theta) \geq \frac{1}{2} \log\left(1+\frac{1}{n}\right)+ O\left(\frac{\log(n\ell)}{n\ell}\right)$ demonstrating the penalty per batch for optimal universal predictors (Bondaschi et al., 14 Aug 2025).

3. Conditional Regret in Online Learning and Betting

The conditional regret framework in online betting and learning connects high-probability and almost-sure concentration via Ville events. For a path-wise (adversarial) regret process with variance proxy $V_t$ , the mixture martingale strategy obeys: $R_t \leq C\left(1 + \frac{1+\ln^2(1/\alpha)}{V_t} + \ln(1/\alpha) + \ln\ln(c\sqrt{1+V_t})\right)$ on the Ville event $\mathcal{E}_\alpha: \sup_t \ln Z_t \leq \ln(1/\alpha)$ (Agrawal et al., 13 Dec 2025). As $\alpha \to 0$ , the almost-sure iterated logarithm form emerges,

$R_t \leq (1+o(1))\ln \ln V_t$

for all but finitely many $t$ with probability one under stochastic assumptions, thus bridging adversarial and stochastic analyses.

4. Instance-Dependent and Conditional Regret in Reinforcement Learning

Conditional regret bounds in RL and bandits exploit the problem structure, conditioning on histories or specific state-action pairs. In tabular MDPs, gap-dependent, variance-aware conditional regret bounds take the form: $\mathrm{Regret}(K) \leq \tilde O\left( \sum_{\Delta_h(s,a)>0} \frac{H^2 \wedge \mathrm{Var}_{\max}^c}{\Delta_h(s,a)} \log K \right)$ where $\mathrm{Var}_{\max}^c$ denotes the maximum conditional total variance conditioned on visiting any $(s,h)$ ; this refines classical bounds depending only on unconditional total variance, yielding much sharper guarantees when the MDP has a few rare high-variance decision points (Chen et al., 6 Jun 2025).

In risk-sensitive RL, conditional recommendation regret for CVaR-type or quantile-integral objectives scales as: $\widetilde O \left( H^{3/2} L_G |\mathcal{S}| \sqrt{|\mathcal{S}| |\mathcal{A}| K} \right)$ where $L_G$ is the Lipschitz constant of the quantile/CDF measure. This can be interpreted as conditional regret for tail-optimized objectives (Bastani et al., 2022).

5. Conditional Expected Regret in Bayesian Optimization

Regret analyses for randomized BO algorithms condition on internal randomness, yielding high-probability bounds for the conditional expected regret. For IRGP-UCB, the bound is: $\Pr_\mathcal{A}\left\{ \forall T: E_{\mathcal{F}}[R_T \mid \mathcal{A}] \leq U(T, \delta) \right\} \geq 1-\delta$ where $U(T,\delta)$ matches classical rates in $O(\sqrt{T \gamma_T \ln |\mathcal{X}|})$ but avoids time-dependent scaling in the confidence parameter by conditioning on algorithmic randomness (Takeno et al., 2 Sep 2024).

Similarly, Bayesian simple-regret bounds in large-domain GP optimization express regret as a conditional fraction of the optimal achievable value, controlled by the domain size and fixed evaluation budget, rather than assuming exhaustive exploration (Wüthrich et al., 2021).

6. Conditional Regret Links in Surrogate Losses and Learning Theory

Enhanced $H$ -consistency bounds leverage conditional regret inequalities between surrogate and target losses. By introducing instance-dependent scaling factors $\alpha(h,x)$ and $\beta(h,x)$ , these results allow inequalities of the type: $\Psi\left( \frac{ \Delta_{\ell_2, \mathcal{H}}(h,x) \mathbb{E}_X[\beta(h,X)] }{ \beta(h,x) } \right) \leq \alpha(h,x) \Delta_{\ell_1, \mathcal{H}}(h,x)$ which imply, after marginalization,

$\Psi\left( R_{\ell_2}(h) - R_{\ell_2}^*(\mathcal{H}) \right) \leq \gamma(h) \left( R_{\ell_1}(h) - R_{\ell_1}^*(\mathcal{H}) \right)$

yielding strictly sharper finite-sample error bounds by accounting for conditional regret at each instance (Mao et al., 18 Jul 2024).

Applications span multi-class classification, estimation under low-noise Tsybakov conditions, and bipartite ranking. Notably, techniques recover conventional $H$ -consistency as a special case when $\alpha \equiv \beta \equiv 1$ .

7. Connections and Implications

Conditional regret bounds allow nuanced quantification of algorithmic performance: rates can be much tighter and more adaptive than unconditional minimax bounds.
Information-theoretic characterizations via conditional mutual information and conditional Sibson mutual information serve as sharp lower bounds; conditional $\alpha$ -NML predictors are saddle-point optimal (Bondaschi et al., 14 Aug 2025).
Gap- and variance-conditional bounds in RL precisely capture how local structure can sharply reduce total regret mass; in many practical settings, the conditional total variance is parametrically smaller than unconditional alternatives (Chen et al., 6 Jun 2025, Zanette et al., 2019).
Path-wise, Ville-event conditional regret bounds provide a robust bridge between adversarial and stochastic approaches in online learning, including for unbounded data, yielding law-of-the-iterated-logarithm rates (Agrawal et al., 13 Dec 2025).
Enhanced H-consistency bounds rigorously separate instance-dependent effects, yielding sharper sample complexity and risk bounds in statistical learning (Mao et al., 18 Jul 2024).

These frameworks suggest richer, more data-adaptive analyses of regret, and connect deeply with modern developments in universal prediction, bandit theory, and reinforcement learning.