Conjugate Risk Bounds

Updated 9 April 2026

Conjugate Risk Bounds are risk inequalities derived via convex conjugate duality that offer dual representations and generalization guarantees in optimization and learning theory.
They provide explicit upper and lower risk estimates and tractable formulations for robust convex risk measures, uncertainty sets, and PAC-Bayesian bounds.
Applications span robust optimization, risk-constrained decision-making, and deep network training by ensuring tight certificates and controlled statistical loss.

Conjugate risk bounds are a class of upper and lower risk inequalities, dual characterizations, and generalization guarantees derived via convex conjugate duality—most prominently the Fenchel–Legendre transform—applied to risk measures and learning objectives. This framework leverages the dual structure of risk functionals, penalties, and statistical losses to obtain sharp, often tight, and algorithmically tractable risk bounds applicable in optimization, statistical estimation, learning theory, and stochastic control. Conjugate risk bounds inform both theoretical risk limits and the construction of robust optimization and learning algorithms.

1. Dual Formulation of Convex Risk Measures

Given a probability space $(\Omega, \mathcal{F}, \mathbb{P})$ and a Banach space $L^p = L^p(\Omega, \mathcal{F}, \mathbb{P})$ for $1 \leq p \leq \infty$ , a convex risk measure $\rho: L^p \rightarrow \mathbb{R} \cup \{+\infty\}$ can be extended to robust or worst-case settings via uncertainty sets $\{\mathcal{U}_X \subset L^p\}_{X\in L^p}$ , each closed, convex, bounded, and containing $X$ . The worst-case risk measure is

$\rho^{\mathrm{WC}}(X) := \sup_{Z \in \mathcal{U}_X} \rho(Z).$

Under standard convexity and regularity conditions, $\rho^{\mathrm{WC}}$ remains proper, convex, and lower semicontinuous, thereby admiting a dual representation: $\rho^{\mathrm{WC}}(X) = \sup_{\mathbb{Q} \in \mathcal{Q}} \Big\{ \mathbb{E}_{\mathbb{Q}}[-X] - \alpha_{\rho^{\mathrm{WC}}}(\mathbb{Q}) \Big\},$ where probability measures $\mathbb{Q}$ are absolutely continuous with respect to $L^p = L^p(\Omega, \mathcal{F}, \mathbb{P})$ 0, and the penalty function $L^p = L^p(\Omega, \mathcal{F}, \mathbb{P})$ 1 is the convex conjugate. The critical contribution is an explicit formula for this new penalty: $L^p = L^p(\Omega, \mathcal{F}, \mathbb{P})$ 2 where

$L^p = L^p(\Omega, \mathcal{F}, \mathbb{P})$ 3

and $L^p = L^p(\Omega, \mathcal{F}, \mathbb{P})$ 4 (Righi, 2024).

2. Closed-form Conjugate Penalties and Explicit Risk Bounds

Conjugate risk bounds yield tractable upper bounds when the uncertainty sets are norm-balls or Wasserstein balls. For uncertainty sets of the form $L^p = L^p(\Omega, \mathcal{F}, \mathbb{P})$ 5, Hölder's inequality leads to

$L^p = L^p(\Omega, \mathcal{F}, \mathbb{P})$ 6

The dual penalty shift is

$L^p = L^p(\Omega, \mathcal{F}, \mathbb{P})$ 7

This yields an explicit risk upper bound for all $L^p = L^p(\Omega, \mathcal{F}, \mathbb{P})$ 8: $L^p = L^p(\Omega, \mathcal{F}, \mathbb{P})$ 9 where $1 \leq p \leq \infty$ 0 is the subdifferential of $1 \leq p \leq \infty$ 1 at $1 \leq p \leq \infty$ 2. An analogous result holds for uncertainty sets defined via the $1 \leq p \leq \infty$ 3-Wasserstein ball (Righi, 2024).

3. Conjugate Duality in Risk-constrained Optimization

In nonconvex functional programming with risk constraints, risk-conjugate duality underpins strong duality results. For risk measures $1 \leq p \leq \infty$ 4 that are convex, lower semicontinuous, and positively homogeneous, the dual representation is

$1 \leq p \leq \infty$ 5

where $1 \leq p \leq \infty$ 6 is a suitable bounded subset of $1 \leq p \leq \infty$ 7. This envelope is the convex conjugate of $1 \leq p \leq \infty$ 8. The associated dual program admits no gap—certificates (bounds) derived from the dual are tight upper bounds on primal values, with exactness under Slater-type conditions and infinite-dimensional Lyapunov convexity (Kalogerias et al., 2022).

For specific risk measures, such as CVaR and MAD, this yields:

CVaR: Envelope dual $1 \leq p \leq \infty$ 9, with conjugate as indicator of $\rho: L^p \rightarrow \mathbb{R} \cup \{+\infty\}$ 0.
MAD: Dual envelope $\rho: L^p \rightarrow \mathbb{R} \cup \{+\infty\}$ 1, again the conjugate is the indicator.

4. Conjugate Domain Dichotomy and Robust Estimation

In high-dimensional M-estimation under heavy-tailed noise, the boundedness of the domain of the convex conjugate of the loss function dictates whether the risk of the estimator is bounded:

If $\rho: L^p \rightarrow \mathbb{R} \cup \{+\infty\}$ 2 is bounded (e.g. Huber, absolute value, quantile loss), the dual variables in the min-max problem are uniformly bounded, so the risk remains bounded even under infinite-variance noise.
If $\rho: L^p \rightarrow \mathbb{R} \cup \{+\infty\}$ 3 is unbounded (squared loss), the dual variables can diverge with noise magnitude, and the risk diverges unless external regularization (transfer to a prior) is introduced.

For squared loss under transfer-regularized ridge ( $\rho: L^p \rightarrow \mathbb{R} \cup \{+\infty\}$ 4), the risk converges to a universal floor determined by the distance between the true parameter and the prior in Mahalanobis norm, independent of the regularizer's details or noise scale. This is demonstrated via a Convex Gaussian Minimax Theorem analysis (Agiropoulos, 30 Mar 2026).

5. PAC-Bayes and Generalization Bounds via Conjugate Risk Measures

Conjugate duality governs tight PAC-Bayesian generalization bounds for constrained $\rho: L^p \rightarrow \mathbb{R} \cup \{+\infty\}$ 5-entropic risk measures, enabling subgroup-robust generalization in learning. The risk is defined over a constrained set of subgroup-weightings via

$\rho: L^p \rightarrow \mathbb{R} \cup \{+\infty\}$ 6

with dual (conjugate) representation: $\rho: L^p \rightarrow \mathbb{R} \cup \{+\infty\}$ 7 Generalization bounds are derived via Donsker–Varadhan duality and further involve the Fenchel–Legendre conjugate appearance in the risk and the PAC-Bayes term, yielding what are termed "conjugate risk bounds." These bounds can be directly optimized in self-bounding algorithms, yielding subgroup-valid generalization guarantees (Atbir et al., 13 Oct 2025).

6. Conjugate Learning Theory: Trainability and Generalization in Deep Networks

The Fenchel–Young (convex conjugate) loss provides a unified framework for analyzing both optimization trainability and out-of-sample generalization. Empirical risk minimization with a Fenchel–Young loss is equivalent to (constrained) maximum-likelihood under exponential-family models, and the minimal achievable risk is bounded below by the generalized conditional entropy of the data; this is a direct consequence of convex duality: $\rho: L^p \rightarrow \mathbb{R} \cup \{+\infty\}$ 8 where $\rho: L^p \rightarrow \mathbb{R} \cup \{+\infty\}$ 9 is the maximum loss, and $\{\mathcal{U}_X \subset L^p\}_{X\in L^p}$ 0 is the Fenchel–Young loss (Qi, 18 Feb 2026).

Deterministic and probabilistic generalization bounds depend explicitly on model capacity (max-loss), the information-theoretic entropy of the data, and the information loss induced by model architectures (such as surjective or irreversible mappings). The bounds have explicit dependence on network width, depth, batch size, residual connections, and sparsity.

7. Large Deviation Principles and Robust SGD via Conjugate Transforms

In robust optimization and stochastic first-order methods, conjugate risk bounds appear in the analysis of risk-sensitive cost indices (RSI) and large-deviation principles. The RSI for stochastic-gradient iterates satisfies

$\{\mathcal{U}_X \subset L^p\}_{X\in L^p}$ 1

with the large-deviation rate function given by the Legendre–Fenchel transform: $\{\mathcal{U}_X \subset L^p\}_{X\in L^p}$ 2 where $\{\mathcal{U}_X \subset L^p\}_{X\in L^p}$ 3 is the worst-case $\{\mathcal{U}_X \subset L^p\}_{X\in L^p}$ 4-gain of the method. This connects algorithmic robustness to exponential tail decay, and explicit finite-time upper bounds on risk can be produced analogously by bounding the conjugate of the risk-sensitive index (Gürbüzbalaban et al., 17 Sep 2025).

Summary Table: Conjugate Risk Bound Forms in Representative Contexts

Context	Primal/Empirical Form	Dual/Conjugate Representation
Robust convex risk ( $\{\mathcal{U}_X \subset L^p\}_{X\in L^p}$ 5)	$\{\mathcal{U}_X \subset L^p\}_{X\in L^p}$ 6	$\{\mathcal{U}_X \subset L^p\}_{X\in L^p}$ 7
Risk-constrained optimization	Risk constraint: $\{\mathcal{U}_X \subset L^p\}_{X\in L^p}$ 8	Envelope: $\{\mathcal{U}_X \subset L^p\}_{X\in L^p}$ 9
M-estimation (heavy tails)	$X$ 0	min-max form with dual variable domain $X$ 1 dictating noise influence
Learning theory (PAC-Bayes)	$X$ 2	$X$ 3
SGD/robust optimization	Time-averaged excess risk	LDP rate: $X$ 4

Conjugate risk bounds, based on convex duality, unify several domains: providing explicit risk inflation bounds in robust statistics, tight certificates in risk-constrained optimization, universal dichotomies under heavy-tailed noise, sharp generalization error controls in learning theory, and precise exponential deviation decay in stochastic optimization. The mathematical tractability and interpretability of these conjugate formulations afford both theoretical guarantees and algorithmic utility across stochastic decision sciences, robust learning, and optimization (Righi, 2024, Kalogerias et al., 2022, Agiropoulos, 30 Mar 2026, Atbir et al., 13 Oct 2025, Qi, 18 Feb 2026, Gürbüzbalaban et al., 17 Sep 2025).