Probabilistic Deep Learning Framework

Updated 19 November 2025

Probabilistic deep learning framework is a systematic approach that models uncertainty using neural operators and stochastic processes.
It leverages operator-theoretic foundations and DeepONet architectures to achieve calibrated learning in high-dimensional stochastic settings.
The framework enables practical applications like financial option pricing and real-time simulation by generalizing across diverse parametric families.

A probabilistic deep learning framework encompasses architectures, methodologies, and theoretical paradigms that rigorously model, learn, and reason about uncertainty and probability distributions in deep neural networks. These frameworks provide explicit probabilistic semantics for functions, operators, or predictions produced by deep models. They enable principled quantification of uncertainty, universal approximation capabilities for operator learning, and integration of stochastic processes, ultimately achieving calibrated, generalizable learning in high-dimensional stochastic settings.

1. Operator-Theoretic Foundations and Stochastic Process Modeling

At the core of advanced probabilistic deep learning is the learning of operators that map between functional spaces under uncertainty. The neural operator paradigm formalizes this by considering nonlinear operators

$\Gamma: \mathcal{G} \to \mathcal{U}$

where $\mathcal{G}$ and $\mathcal{U}$ are spaces of stochastic processes (e.g., adapted functionals of Itô processes, with integrability and moment/tail bounds). Stochastic processes $X_t$ are required to satisfy:

Uniform $p$ -moment bounds:

$\mathbb{E}\left[\sup_{0\le t\le T}|X_t|^p\right] \le C_p < \infty$

Uniform sub-Gaussian tail bounds:

$\mathbb{P}\left(\sup_{0\le t\le T}|X_t - X_0|\ge r\right) \le \exp\left(-\frac{c r^\alpha}{C_T}\right)$

Operators $\Gamma$ are globally Lipschitz in the $S^2$ -norm:

$\|\Gamma(g_1)-\Gamma(g_2)\|_{S^2} \le L_{\Gamma} \|g_1-g_2\|_{S^2}$

allowing for robust learning and generalization. The operator class is broad and encompasses, for instance, both the European and American option pricing operators derived from forward-backward SDEs (FBSDEs), covering cases with and without free boundaries (Bayraktar et al., 10 Nov 2025).

2. Deep Neural Operator Architectures and Approximation Properties

The neural operator learning framework utilizes an explicit “branch–trunk” (DeepONet-style) architecture:

Branch nets $\tilde a_k: \mathbb{R}^{N_2} \rightarrow \mathbb{R}$ encode discretized functional input, mapping sampled evaluations $(g(x_1), ..., g(x_{N_2}))$ .
Trunk nets $\tilde q_k: \mathbb{R}^{d_2} \rightarrow \mathbb{R}$ encode the spatial (or parametric) evaluation point $y$ .

The operator is approximated as:

$\Gamma_{\theta}(g)(y) = \sum_{k=1}^{N_1} \tilde a_k(\{g(x_i)\}_{i=1}^{N_2}) \, \tilde q_k(y)$

with both $\tilde a_k$ and $\tilde q_k$ instantiated as ReLU feedforward networks with explicit depth, width, and sparsity constraints.

Universal Approximation Theorem

Under general integrability, tail, polynomial growth, and Lipschitz assumptions, for arbitrarily small $\varepsilon>0$ , one can construct such branch–trunk neural networks with

$N_1=\mathcal O(\varepsilon^{-d_2}), \quad N_2=\mathcal O(\varepsilon^{-d_1 d_2 - d_1})$

and quantifiable network depth/width so that

$\sup_{g \in \mathcal{G}} \mathbb{E} \left[ \sup_{0 \le t \le T} \left| \Gamma(g)(X_t) - \Gamma_\theta(g)(X_t) \right| \right] \le \varepsilon$

(Network sizes grow exponentially in input/output dimension, reflecting the curse of dimensionality) (Bayraktar et al., 10 Nov 2025).

3. Specialization: Surrogate Learning for Stochastic PDE Operators

This framework explicitly specializes to stochastic differential equations and stochastic PDEs encountered in probabilistic modeling:

European Option Pricing: The operator $\Gamma^E$ maps terminal payoff functions to price surfaces via the solution to the linear parabolic PDE (or equivalently, a BSDE). $\Gamma^E$ satisfies the desired global Lipschitz property with constant $L_{\Gamma^E}=4 e^{2 \overline{r} T}$ .
American (Reflected) Option Pricing: The operator $\Gamma^A$ is defined through a reflected BSDE or an obstacle problem for a parabolic PDE. The operator remains Lipschitz with a (slightly weaker) estimate in the $S^2$ -norm.

Numerical experiments confirm that neural operators, trained on a basket of American payoffs with randomly varying strikes, can compute exercise boundaries for new strikes without retraining, achieving RMSE of $\mathcal{O}(10^{-3})$ and boundary location error $\mathcal{O}(10^{-2})$ (Bayraktar et al., 10 Nov 2025). The model generalizes the operator across a family of payoffs, providing one surrogate for the entire function class.

4. Computational Workflow and Optimization

Learning a probabilistic deep neural operator proceeds via empirical risk minimization:

$\mathcal{L}(\theta) = \frac{1}{M N_t N_x}\sum_{i=1}^M\sum_{n=0}^{N_t-1}\sum_{j=0}^{N_x-1} |\Gamma_\theta g_{K_i}(y_j, t_n) - u_{i,j,n}|^2$

with Adam optimizer (learning rate $10^{-3}$ , batch size 16) over many epochs (e.g., 2000). ReLU networks are constructed with depth, width, and nonzeros according to desired $\varepsilon$ accuracy. The architecture admits efficient approximation of coordinate-wise products, grid-based partition-of-unity, and functional evaluations via explicit modular neural components.

5. Theoretical and Empirical Limitations

Curse of Dimensionality

The explicit network size grows exponentially in the input/output dimensions $(d_1, d_2)$ due to the fundamental scaling of universal operator approximation. In practice, high-dimensional cases benefit from specialized operator architectures (e.g., spectral neural operators, randomized projections, convolutional/trunk nets).

Model Assumptions

Assumed stochastic processes exclude heavy-tailed distributions or rough/fractional SDEs; extension to these regimes remains open.
Obstacle problems (reflected PDEs) require reflection terms; extensions to more general variational inequalities (including game options) are under current investigation.

Generalization and Surrogate Modeling

The neural operator learned surrogates can generalize across parametric families (e.g., a continuum of strike prices) without retraining, contingent on sufficient training diversity.
Mesh-agnostic or unstructured input representations, as well as further reductions of required reference solutions by leveraging PDE-informed loss functions, are noted extensions.

6. Integration within the Broader Probabilistic Deep Learning Landscape

Compared to other probabilistic deep learning frameworks, the neural-operator methodology is characterized by:

Explicit mapping between function spaces under random input and stochastic process constraints.
Rigorous balance between approximation theory (quantitative universal operator approximation) and deep network expressiveness.
Compatibility with advances in probabilistic surrogate modeling, e.g., for scientific computing, real-time simulation, and uncertainty quantification.

This operator-centric view complements probabilistic programming, Bayesian neural networks, and deep probabilistic graphical models by focusing on the accurate and globally generalizable learning of stochastic maps between infinite-dimensional function spaces—a feature crucial for scientific, financial, and engineering systems governed by PDEs and SDEs.

References

"Deep Neural Operator Learning for Probabilistic Models" (Bayraktar et al., 10 Nov 2025)

PDF Markdown Chat (Pro)

References (1)

Deep Neural Operator Learning for Probabilistic Models (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Probabilistic Deep Learning Framework.