Divergence Ambiguity Sets in DRO
- Divergence ambiguity sets are uncertainty regions constructed using statistical divergences around a nominal distribution, ensuring robust risk minimization.
- They employ divergences such as φ-divergence, KL, Bregman, Wasserstein, and Sinkhorn to handle model misspecification and sparse data effectively.
- These sets support decision-dependent robustness with tractable dual formulations that enhance performance in risk-averse machine learning and robust control.
Divergence ambiguity sets are foundational constructs in distributionally robust optimization (DRO), used to specify uncertainty sets of probability measures around a nominal distribution. The "divergence" terminology refers to the use of a statistical divergence, such as φ-divergence, Kullback–Leibler (KL) divergence, Bregman divergence, or generalizations such as Wasserstein–Bregman and Sinkhorn divergences, which measure discrepancy between probability distributions. By defining ambiguity sets in terms of divergence balls of specified radius, one ensures that the true distribution likely lies within the set, thus enabling robust risk minimization even under distributional misspecification or limited data. Divergence ambiguity sets are central in a broad array of modern DRO applications, including risk-averse machine learning, control, and Bayesian decision-making.
1. General Structure of Divergence Ambiguity Sets
A divergence ambiguity set is an uncertainty region in the space of probability measures constructed as a level set of a chosen divergence with respect to a reference ("nominal") distribution. Given a nominal law and a divergence , the general form is:
where calibrates the size of the ambiguity set and is often set via statistical concentration or probabilistic guarantees (Guo et al., 2017). The specific divergence determines both statistical and geometric properties of . Choices include:
- φ-divergence (with various generator functions φ),
- KL-divergence (relative entropy),
- Bregman divergence,
- Wasserstein distance,
- Sinkhorn (entropic regularized OT) divergence,
- hybrid divergences such as Wasserstein–Bregman.
These sets can be decision-independent or decision-dependent, the latter meaning that the divergence budget itself is a function of the decision variable, e.g., (Luo et al., 2018, Fochesato et al., 13 May 2025).
2. Classical φ-Divergence and KL-Ambiguity Sets
A prominent family is the φ-divergence ambiguity sets, defined for a convex, lower semi-continuous function as:
where is a nominal distribution and
Key cases include:
- Kullback–Leibler ambiguity set: , yielding .
- Pearson -divergence set: (Luo et al., 2018, Fochesato et al., 13 May 2025).
The DRO minimax model is:
where the divergence budget is potentially decision-dependent.
In finite-support settings, dualization yields tractable reformulations. For the KL-divergence case, the well-known log-sum-exp dual structure is recovered:
where denotes the nominal probability of sample (Luo et al., 2018). For the general φ-divergence case, saddle-point or nonlinear program formulations are derived, with computational approaches including global optimization, dual reduction to tractable convex programs, and exchange algorithms.
Table: Selected φ-divergence Sets
| φ/divergence | Generator φ(t) | Dual/Reduction Structure |
|---|---|---|
| KL | log-sum-exp, exponential-cone | |
| Pearson χ² | Quadratic, reduced dual forms | |
| General φ | Convex φ | Nonconvex/convex programs |
3. Bregman, Wasserstein, and Wasserstein–Bregman Ambiguity Sets
Bregman divergence ambiguity sets generalize KL- and χ²-divergence sets. For a strictly convex, differentiable ,
with discrete distributions (Guo et al., 2017). The ambiguity set is
where is the empirical law.
Wasserstein ambiguity sets use the -Wasserstein metric:
yielding symmetric “transport-type” tubes around the nominal law, which are particularly important in high-dimensional or nonparametric settings.
Wasserstein–Bregman ambiguity sets are constructed with the ground cost replaced by a Bregman divergence:
providing a continuum between pure transport and information-theoretic robustness (Guo et al., 2017).
Choice of divergence impacts convexity, computational tractability, and statistical concentration rates. For instance, Bregman balls are convex and contract exponentially fast around the empirical law, while Wasserstein balls incur a dimension-dependent concentration penalty. Wasserstein–Bregman balls inherit both types of robustness, and can be advantageous in balancing tractability and sensitivity to statistical misspecification.
4. Sinkhorn and Entropic-OT Ambiguity Sets
Sinkhorn ambiguity sets employ the entropic-regularized optimal transport (OT) (ES-OT) discrepancy to define ambiguity regions. Given a cost , reference law , and regularization parameter ,
where denotes the KL divergence and denotes couplings (Cescon et al., 26 Mar 2025).
Key structural features are:
- As , Sinkhorn divergence recovers the classic Wasserstein distance.
- As , the ambiguity set contracts to the singleton , making the robust solution recede to standard stochastic control.
- For intermediate , there is a continuous interpolation between robust Wasserstein and nominal stochastic regimes.
In linear system DRO control, Sinkhorn ambiguity sets yield tractable SDP-based convex programs for the controller, with explicit LMIs and log-determinant constraints. The monotonicity of worst-case cost with respect to is established, and in scarce-data regimes, Sinkhorn–DRO shows empirical advantage over both pure Wasserstein DRO and nominal control (Cescon et al., 26 Mar 2025).
5. Robust Bayesian and MMD-Based Ambiguity Sets
Recent approaches incorporate Bayesian modeling and nonparametric priors to address model misspecification. The robust Bayesian ambiguity set (DRO–RoBAS) is built as a Maximum Mean Discrepancy (MMD) ball in a reproducing kernel Hilbert space (RKHS) around a robust mixture posterior predictive law:
where is the squared MMD with kernel and is a nonparametrically robustified mixture over model projections (Dellaporta et al., 6 May 2025).
The minimax DRO problem in this setting translates to a saddle-point RKHS optimization:
which, via Fenchel–duality, becomes a finite-dimensional quadratic program after invoking the representer theorem.
Finite-sample guarantees on the coverage of the true data-generating process within are provided. The tolerance is selected based on explicit probabilistic bounds that depend on the concentration parameter and the kernel.
DRO–RoBAS is shown to yield improved out-of-sample performance under mild to severe model misspecification compared to standard empirical or Bayesian DRO, at a manageable computational cost (Dellaporta et al., 6 May 2025).
6. Decision-Dependent Divergence Ambiguity Sets
A major extension is the decision-dependent ambiguity set: the radius or divergence budget becomes a function of the optimizer variable , leading to endogenous robustness (the ambiguity set “reacts” to the decision) (Luo et al., 2018, Fochesato et al., 13 May 2025). In stochastic programming and robust control:
This framework encompasses settings where the risk profile or uncertainty tolerance is itself controlled by the optimization, often encoding endogenous sources of uncertainty.
In Linear-Quadratic-Gaussian (LQG) control, KL-ambiguity sets with decision-dependent budgets require modified dynamic programming and best-response methods, with stable Riccati recursions and explicit dependence of the robustification parameter on decision and state variables (Fochesato et al., 13 May 2025). Dualization and strong duality continue to hold, with tractable numerical schemes.
7. Comparative Properties and Practical Considerations
The choice of divergence type and structure impacts key properties of the ambiguity set and resulting optimization:
| Divergence Type | Convexity (in ) | Symmetry | Tractability | Statistical Rate |
|---|---|---|---|---|
| φ/Bregman/KL | Yes (strictly) | No | Exponential-cone | |
| Wasserstein | Yes | Yes | Linear/conic SDP | |
| Sinkhorn | Yes | Yes | SDP+LogDet cone | OT-type, inherits from Wasserstein |
| Wasserstein–Bregman | Often | No | As for Wasserstein | Intermediate; flexible |
| MMD (RKHS) | Yes | Metric | QP in kernel dual | RKHS concentration (Dellaporta et al., 6 May 2025) |
Convexity ensures tractability for large-scale problems. Asymmetry of Bregman-type divergences can provide fine control of "directional" uncertainty, while Wasserstein-type divergences provide geometry-aware transport robustness. The hybrid Wasserstein–Bregman and Sinkhorn approaches interpolate these regimes and have numerically favorable properties, especially in small-sample or highly misspecified contexts.
In all cases, statistical concentration inequalities and non-asymptotic bounds guide the principled calibration of divergence budgets to achieve desired coverage properties for the true distribution (Guo et al., 2017, Cescon et al., 26 Mar 2025, Dellaporta et al., 6 May 2025).
References
- Distributionally Robust Optimization with Decision Dependent Ambiguity Sets (Luo et al., 2018)
- Ambiguity set and learning via Bregman and Wasserstein (Guo et al., 2017)
- Data-driven Distributionally Robust Control Based on Sinkhorn Ambiguity Sets (Cescon et al., 26 Mar 2025)
- Decision Making under Model Misspecification: DRO with Robust Bayesian Ambiguity Sets (Dellaporta et al., 6 May 2025)
- Distributionally Robust LQG with Kullback-Leibler Ambiguity Sets (Fochesato et al., 13 May 2025)