Probabilistic Robust Learning

Updated 12 April 2026

Probabilistic Robust Learning (PRL) is a paradigm that uses quantile-based risk measures to balance model accuracy with robustness against uncertainty.
It generalizes both worst-case adversarial training and average-case risk minimization by applying chance constraints and CVaR formulations.
PRL methods are applied across vision, control, and time series, employing techniques like SDP, variational inference, and robust SGD for enhanced empirical performance.

Probabilistic Robust Learning (PRL) is a paradigm in machine learning and control that focuses on optimizing models or controllers to balance accuracy and robustness by quantifying and controlling the probability of failure under uncertainty in data, parameters, dynamics, or perturbations. PRL generalizes traditional worst-case (adversarial) robustness and average-case risk minimization by targeting performance guarantees not “for all” scenarios but for “most” scenarios, as measured by an explicit probabilistic risk threshold or quantile criterion. Theoretical formulations, algorithmic techniques, and empirical validations across diverse settings have clarified the properties, trade-offs, and advantages of the PRL approach.

1. Formal Definitions and Unifying Principles

The essence of Probabilistic Robust Learning lies in replacing the worst-case (supremum) risk evaluation with a probabilistic or risk-quantile evaluation of model performance under input, parameter, distributional, or environmental uncertainty. Several formulations crystallize this principle:

Quantile-Based Risk: For a loss function $\ell(h(x),y)$ and distribution $P_\delta$ over perturbations $\delta$ , the $\rho$ -essential supremum risk is defined as

$\min_{h\in\mathcal H}~ \mathbb E_{(x,y)}\Big[\rho{\rm -ess}\sup_{\delta\sim P_\delta}\ell(h(x+\delta),y)\Big]$

where the $\rho$ -ess~sup truncates the upper tail of the loss distribution at the $1-\rho$ quantile, interpolating between average-case risk ( $\rho=1$ ) and adversarial risk ( $\rho=0$ ) (Robey et al., 2022, Bungert et al., 2023).

Probabilistic Robustness in Supervised Learning: The model is required to be robust on all but a $\rho$ fraction of perturbed examples, with the constraint formalized via chance-constrained or CVaR (Conditional Value at Risk) programs, facilitating efficient SGD-style algorithms (Robey et al., 2022, Bungert et al., 2023).
Probabilistic Stability in Control: In learning-based controller synthesis, a controller $P_\delta$ 0 is said to be $P_\delta$ 1-robustly stabilizing if

$P_\delta$ 2

that is, stability is guaranteed except for an $P_\delta$ 3 fraction of system realizations within a $P_\delta$ 4-credible region of the posterior over dynamics parameters (Rohr et al., 2021).

Non-Parametric Probabilistic Robustness (NPPR): The robustness metric is defined as the minimum probability of correct classification under the worst-case choice of a perturbation distribution from a broad admissible family:

$P_\delta$ 5

yielding a conservative and data-driven robustness guarantee (Wang et al., 21 Nov 2025).

Distributionally Robust Optimization with Soft Groups: Robustness to group shifts is formulated as a min-max problem over group-weight distributions, where "soft" probabilistic group membership allows more realistic modeling of labeling uncertainty (Ghosal et al., 2023).

2. Loss Functions, Uncertainty Sets, and Optimization

PRL encompasses a rich set of risk objectives and optimization algorithms:

CVaR and Min-Max Formulations: The non-convex essential supremum can be upper-bounded by CVaR:

$P_\delta$ 6

facilitating stochastic gradient-based minimization (Robey et al., 2022, Bungert et al., 2023).

Scenario-based Semidefinite Programs: In probabilistic robust control synthesis, the expected infinite-horizon cost under a truncated posterior is approximated by scenario sampling, followed by common-Lyapunov or majorize-minimize SDPs. Guarantees on the stability risk $P_\delta$ 7 and coverage $P_\delta$ 8 are provided via concentration bounds and scenario theory (Rohr et al., 2021).
Entropy-Constrained Weighted Empirical Risks: In robust risk minimization, outlier resistance is achieved by assigning exponential weights to samples under an entropy constraint, forming a distributionally robust optimization over a KL-divergence ball. The algorithm alternates between weight and parameter updates, downweighting up to an $P_\delta$ 9 fraction of largest-loss examples (Osama et al., 2019).
Variational Inference with Robust Loss and Prior Divergence: Generalized variational objectives incorporate robustness both as a data loss (via power-divergence or generalised cross-entropy) and as prior divergence (weighted-KL, Rényi- $\delta$ 0), yielding bias-robust, calibrated federated posterior distributions (Mildner et al., 2 Feb 2025).
Functional Optimization in Time-Series: For stochastic dynamical systems, robust probabilistic predictors optimize the worst-case expected log-likelihood functional under information constraints, with explicit moment-based robust predictors derived as solutions (Xu et al., 2023).

3. Theoretical Guarantees and Trade-offs

The PRL framework exhibits several core theoretical results clarifying the interplay among risk, sample complexity, and robustness:

Interpolation of Risk Regimes: PRL offers a continuum between ERM (average-case) and adversarial (worst-case) risk. Theoretical analysis shows that for any $\delta$ 1 the sample complexity matches that of ERM. For $\delta$ 2 (worst-case), sample complexity can become unbounded or incur logarithmic explosion (Robey et al., 2022, Bungert et al., 2023).
Generalization Bounds: Uniform generalization across groups or under contamination are established via regularization terms (e.g., $\delta$ 3) (Ghosal et al., 2023), or via entropy/ KL-divergence constraint on sample weights (Osama et al., 2019).
Breakdown Points and Consistency: Maximum influence of adversarial points is bounded by the chosen robustness hyperparameter (e.g., $\delta$ 4 or $\delta$ 5), ensuring consistent estimation and oracle-like performance even under heavy contamination (Osama et al., 2019).
Concentration and Coverage: Probabilistic guarantees are formalized via Hoeffding or Chernoff bounds for stability rate (in control), empirical log-likelihood (in prediction), or conformal prediction coverage under adversarial perturbations (Rohr et al., 2021, Kang et al., 2024, Xu et al., 2023).
Geometric Regularization and $\delta$ 6-Convergence: Geometric analysis shows that PRL risk functions induce novel nonlocal perimeter regularizers which (as the quantile parameter $\delta$ 7) converge to the adversarial training risk. Concave relaxations (e.g., via CVaR) ensure existence of minimizers and tractable optimization for both hard and soft classifier classes (Bungert et al., 2023).

4. Algorithms and Implementations

Algorithmic contributions in PRL span a range from convex optimization to deep learning:

Risk-Aware SGD for PRL: Minibatch SGD is performed on a composite objective that averages the CVaR of per-example perturbation losses, requiring only random sampling over perturbation sets and efficient dual optimization for tail risks (Robey et al., 2022).
Scenario SDP for Control: Synthesis alternates between sample-based convex SDPs for controller initialization and iterative improvement. The procedure is certified to deliver stability guarantees with user-controlled risk and confidence (Rohr et al., 2021).
Coordinate Descent for Weighted Robust Risk: Iteratively solving for optimal sample weights (via exponential reweighting) and model parameters yields robust ERM solutions that directly control the effective influence of contaminated data (Osama et al., 2019).
Nonparametric Worst-Case Noise Estimation: GMM+MLP-based parameterizations of perturbation distributions, optimized via margin relaxations and Gumbel-softmax sampling, implement NPPR for conservative robustness evaluation, being more conservative than classical PR with fixed noise (Wang et al., 21 Nov 2025).
Kalman-Filter-Based Robust Prediction: Moment-based robust predictors are dynamically constructed online within a Kalman filter loop, adapting the density family to available moment knowledge and updating log-likelihood scores incrementally (Xu et al., 2023).
Low-Variance Policy Search in RL: Enforcing lower bounds on GP measurement-noise variance during model-based RL training (PILCO-style) yields robust policies resilient to model mis-specification and domain shift (Charvet et al., 2021).
Group-Risk Minimax Optimization: In soft-group DRO, simultaneous mirror-descent is performed over model parameters and group weights, using pseudo-labeled or unsupervised group membership (Ghosal et al., 2023).

5. Applications and Empirical Evaluations

PRL methods have been validated in wide-ranging domains:

Vision and Classification: PRL architectures (CVaR-focused SGD, NPPR) on major benchmarks (MNIST, CIFAR-10/100, TinyImageNet, SVHN) demonstrate substantial gains in robustness over ERM, improved worst-group accuracy in distributional shift scenarios, and empirically achieved prescribed tail risk guarantees (Robey et al., 2022, Wang et al., 21 Nov 2025, Ghosal et al., 2023).
Control and Dynamical Systems: Probabilistic robust LQR controllers trained from GP posteriors maintain stability margins under significant uncertainty, require less data than worst-case methods, and outperform certainty-equivalence in noisy regimes (Rohr et al., 2021). Action-robust RL with probabilistic policy execution uncertainty yields minimax-optimal policies and regret/sample bounds, with fast convergence and certified performance under adversarial teleoperation (Liu et al., 2023).
Time Series Prediction: Robust moment-based probabilistic predictors dominate traditional Kalman/Gaussian approaches under model or noise heavy-tail scenarios, as measured by trajectory log-likelihood and fail-safe operation percentages (Xu et al., 2023).
Sample-Efficient Robot Learning: Probabilistic robust inference for movement primitives with NIW priors enables adaptation from few demonstrations and yields better conditioning and generalization, outperforming heuristics-heavy baselines in manipulation and dynamic tasks (Gomez-Gonzalez et al., 2018).
Outlier-Resilient Learning: Empirical risk minimization with entropy-constrained weights achieves robust regression, classification, PCA, and covariance estimation in the presence of heavy-tailed and highly contaminated data, matching or exceeding state-of-the-art general robust methods (Osama et al., 2019).
Federated Learning: Generalized variational inference with robust site losses and prior divergences ensures accurate, calibration-preserving federated predictions under model, data, and client misspecification (Mildner et al., 2 Feb 2025).

6. Extensions, Limitations, and Open Questions

PRL remains an active and rapidly expanding research field, with several important directions:

Nonparametric and Distributional Uncertainty: Moving beyond a fixed perturbation distribution (as in NPPR), PRL now admits broader, data-driven search over families of plausible noise distributions, yielding significantly tighter robustness certificates (Wang et al., 21 Nov 2025).
Interpretable and Certified Guarantees: Extensions to certifiable coverage, as in robust conformal prediction via probabilistic circuits, yield system-level, finite-sample correctable coverage bounds even under adversarial attack (Kang et al., 2024).
Sample Complexity and Generalization: Establishing minimax optimal rates and quantifying the price of robustness versus accuracy as a function of the quantile or tail risk parameter $\delta$ 8 remains a central theoretical theme.
Compositional System Safety: Translating component-level PRL evidence to end-to-end system claims, incorporating redundancy, and propagating uncertainty through complex architectures is an open challenge, with compositional probability and safety-case perspectives emerging (Zhao, 20 Feb 2025).
Geometric and Functional Analysis: Systematic study of the geometric regularization induced by probabilistic perimeters and their role in training dynamics and the existence of robust solutions is being advanced via $\delta$ 9-convergence and nonlocal calculus (Bungert et al., 2023).
Robustness under Multi-Type and Complex Uncertainty: Towards unified frameworks that handle simultaneous distributional, parametric, group, and label uncertainty.
Algorithmic Complexity: Designing scalable, efficient algorithms that match the robustness guarantees of PRL formulations in high dimensions remains an open concern, with tailored sampling, variational, and risk-aggregation techniques under active development.

7. Comparative Analysis and Unifying Insights

PRL offers a systematic alternative and complement to both standard empirical risk minimization and adversarial training:

Method	Risk Type	Robustness Property	Sample Complexity	Nominal Performance	Computational Cost
Empirical Risk Minimization	Average (ERM)	Brittle	$\rho$ 0	High	Low
Adversarial (Worst-case)	Supremum	Maximally robust	$\rho$ 1	Low	High
Probabilistic Robust (PRL)	Tail Quantile/CVaR	Tunable (via $\rho$ 2)	$\rho$ 3 (if $\rho$ 4)	Tunable	Medium–High

The key insight is that PRL frameworks admit fine-grained trade-offs between performance and robustness, beyond the dichotomy of average versus worst-case, by specifying explicit risk budgets. Emerging research demonstrates utility in diverse domains: high-dimensional learning, control, time series, federated and group-shifted environments, and adversarial settings.

References: (Robey et al., 2022, Bungert et al., 2023, Rohr et al., 2021, Wang et al., 21 Nov 2025, Kang et al., 2024, Ghosal et al., 2023, Xu et al., 2023, Osama et al., 2019, Gomez-Gonzalez et al., 2018, Charvet et al., 2021, Liu et al., 2023, Zhao, 20 Feb 2025, Mildner et al., 2 Feb 2025).