GPI-PINN 1: Neural Solver for Jump-Diffusions

Updated 19 February 2026

The paper demonstrates that GPI-PINN 1 enforces the HJB equation in high-dimensional jump-diffusion settings and achieves sub-percent errors on LQR benchmarks.
This approach leverages residual-based loss functions with Monte Carlo sampling to train DGM block networks without discretizing time.
Empirical evaluations reveal its limitations for jump-diffusions, suggesting expectation-free formulations may be preferable in high-dimensional scenarios.

The GPI-PINN 1 method is a residual-based physics-informed neural network approach for solving high-dimensional, finite-horizon, continuous-time stochastic control problems characterized by jump-diffusion dynamics. In this framework, neural networks are trained to approximate both the value function and the optimal control policy with no requirement for time discretization of the underlying stochastic differential equations, leveraging the Hamilton-Jacobi-Bellman (HJB) equation as a hard constraint throughout training. This technique enables scalable and accurate policy/value learning, with empirical evidence illustrating its limitations and comparative performance relative to a related expectation-free formulation in the presence of jumps (Cheridito et al., 21 May 2025).

1. Mathematical Structure: Stochastic Control with Jumps

Continuous-time stochastic control with jumps considers a state process $X^\alpha$ on a finite time horizon $T>0$ , controlled by a feedback control $\alpha_t = \alpha(t, X_{t-}^\alpha)\in A\subset \mathbb{R}^m$ . The controlled system evolves according to jump-diffusion dynamics: $dX_t^\alpha = \beta(t, X_t^\alpha, \alpha_t)\,dt + \sigma(t, X_t^\alpha, \alpha_t)\,dW_t + \int_E \gamma(t, X_{t-}^\alpha, z, \alpha_t) N^\alpha(dz,dt)$ where $W$ is a $k$ -dimensional Brownian motion and $N^\alpha$ is a controlled Poisson random measure with state-dependent intensity. The control objective is to maximize the expected cumulative reward: $V(0,x) = \sup_\alpha \mathbb{E}\left[ \int_0^T f(t, X_t^\alpha, \alpha_t)\,dt + F(X_T^\alpha) \right]$ or, conditionally,

$V(t,x) = \sup_\alpha \mathbb{E}\left[ \int_t^T f(s, X_s^\alpha, \alpha_s)\,ds + F(X_T^\alpha) \mid X_t^\alpha = x \right].$

This setting captures both diffusive and discontinuous risk sources and encompasses numerous financial, engineering, and resource allocation problems.

2. Dynamic Programming and the HJB Equation in Jump-Diffusions

Under appropriate regularity conditions, dynamic programming leads to the following Hamilton-Jacobi-Bellman integro-PDE: $\partial_t V(t,x) + \sup_{a\in A} H(t,x,V,a) = 0, \quad V(T, x) = F(x)$ where the Hamiltonian is

$H(t, x, V, a) = f(t, x, a) + \beta^\top \nabla_x V + \frac{1}{2} \mathrm{Tr}[\sigma \sigma^\top \nabla_x^2 V] + \lambda(t, x, a)\mathbb{E}_{Z\sim \mathcal{Z}}[V(t, x + \gamma(t, x, Z, a)) - V(t, x)].$

The integro-differential structure arises from the controlled jump component, embedding both continuous and impulsive state transitions.

3. Residual-Based PINN Training Objectives

GPI-PINN 1 approximates the value function by a neural network $V_\theta(t,x)$ and the feedback control by a neural network $\alpha_\phi(t,x)$ . To enforce the HJB equation and policy optimality, the following loss functions are used:

Value Loss: Enforces the square of the HJB residual and the terminal condition,

$\mathscr{L}_1(\theta, \phi) = \mathbb{E}_{(t,x) \sim \mu}\left[ \mathcal{H}(t,x,\theta,\phi)^2 \right] + \mathbb{E}_{x \sim \nu}\left[ V_\theta(T,x) - F(x) \right]^2$

where $\mathcal{H}(t, x, \theta, \phi) = \partial_t V_\theta + H(t, x, V_\theta, \alpha_\phi(t, x))$ .

Policy Loss: Maximizes the Hamiltonian under the current value estimate,

$\mathscr{L}_2(\theta, \phi) = -\mathbb{E}_{(t, x) \sim \mu}[\mathcal{H}(t, x, \theta, \phi)]$

Training alternates between minimizing $\mathscr{L}_1$ (value update) and $\mathscr{L}_2$ (policy update). Monte Carlo sampling is utilized for the Poisson random measure, and all required gradients are computed by backpropagation.

4. Neural Architecture: DGM Block Networks

Both $V_\theta$ and $\alpha_\phi$ employ the DGM (Deep Galerkin Method) network architecture, as introduced by Sirignano & Spiliopoulos (2018). Key architectural features:

Input: $(t, x) \in \mathbb{R}^{1 + d}$
Layers: One DGM input layer, followed by $L$ DGM blocks with $N$ neurons each. Each block computes gated updates:

$Z^\ell = \sigma(\ldots), \quad G^\ell = \sigma(\ldots), \quad R^\ell, H^\ell \text{ similarly}$

and update via

$S^{\ell+1} = (1-G^\ell)\odot H^\ell + Z^\ell \odot S^\ell$

Output: $V_\theta(t, x) = \sigma_{\mathrm{out}}(W S^{L+1} + b)$ , using Softplus or linear activations as appropriate.
Policy network output: Activations chosen to match control set $A$ (e.g., sigmoid for box constraints; $\tanh$ or linear for unbounded controls).

The policy network matches the value network in structure, differing only in output dimension and activation.

5. Full Training Protocol for GPI-PINN 1

Training is conducted by iteratively alternating between value and policy updates. The process is as follows:

Sample mini-batches $(t_i, x_i) \sim \mu$ , $x_j \sim \nu$ , and $z_i \sim \mathcal{Z}$ .
Value Step: Minimize $\mathscr{L}_1(\theta, \phi)$ over $\theta$ , keeping $\phi$ fixed.
Policy Step: Minimize $\mathscr{L}_2(\theta, \phi)$ (equivalently, maximize the Hamiltonian) over $\phi$ , keeping $\theta$ fixed.
Repeat until convergence.

Both value and policy losses are computed using Monte Carlo approximations, and all gradient calculations leverage automatic differentiation. No time discretization is imposed on the underlying SDE—the method relies exclusively on enforcing the PDE residual.

6. Theoretical Properties and Regularity

Key theoretical requirements:

If $v \in C^{1,2}$ solves the HJB and an argmax policy exists, then $v=V$ and $\hat{\alpha}$ is optimal (Bouchard–Touzi Theorem 2.2.4).
Proposition 3.1 describes a technique to avoid Hessian terms by employing a univariate second-derivative "trick."
Proposition 3.2 demonstrates that $V^\alpha$ minimizes a conditional mean-squared error with respect to the surrogate $G$ functional.
Convergence lacks complete proof, but residual-based schemes (per Baird 1995) are typically more stable; the actor-critic iteration admits local linearization arguments.

Assumptions include sufficient regularity ( $V\in C^{1,2}$ ) and integrability/growth conditions to ensure finiteness of PDE residuals and expectations.

7. Empirical Evaluation and Performance

The architecture employs three DGM blocks ( $L=3$ ) with $N=50$ neurons each, optimized using Adam with learning rates $\eta_1 = \eta_2 = 10^{-3}$ , batch sizes $M_1 = M_2 = 2^{16}$ , and $512$ steps for each update per epoch. Training and sampling use TensorFlow/Keras on an NVIDIA RTX 4090. The main empirical findings:

Linear–Quadratic Regulator (LQR) with jumps: GPI-PINN 1 and its expectation-free variant (GPI-PINN 2) achieve sub-percent errors for $d = 10, 50$ without jumps. With jumps, GPI-PINN 1 becomes infeasible, while GPI-PINN 2 remains tractable even for $d = 50$ and runs in less than an hour. Mean absolute errors in value and policy ( $\mathrm{MAE}_V, \mathrm{MAE}_\alpha$ ) are on the order of $10^{-3}$ – $10^{-2}$ .
Optimal consumption-investment: For problems with constant and stochastic coefficients in $d$ up to $52$, involving $26$-dimensional control, the method demonstrates small residuals $\mathscr{L}_{1,2} \to 0$ and produces stable, monotonic policy functions. No closed-form solution is available in these cases.

A plausible implication is that while GPI-PINN 1 is accurate in moderate dimensions and for diffusive systems, its computational cost becomes prohibitive for high-dimensional jump-diffusion problems, at which point GPI-PINN 2 is preferred (Cheridito et al., 21 May 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Deep Learning for Continuous-time Stochastic Control with Jumps (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GPI-PINN 1.