Papers
Topics
Authors
Recent
Search
2000 character limit reached

GPI-PINN 1: Neural Solver for Jump-Diffusions

Updated 19 February 2026
  • The paper demonstrates that GPI-PINN 1 enforces the HJB equation in high-dimensional jump-diffusion settings and achieves sub-percent errors on LQR benchmarks.
  • This approach leverages residual-based loss functions with Monte Carlo sampling to train DGM block networks without discretizing time.
  • Empirical evaluations reveal its limitations for jump-diffusions, suggesting expectation-free formulations may be preferable in high-dimensional scenarios.

The GPI-PINN 1 method is a residual-based physics-informed neural network approach for solving high-dimensional, finite-horizon, continuous-time stochastic control problems characterized by jump-diffusion dynamics. In this framework, neural networks are trained to approximate both the value function and the optimal control policy with no requirement for time discretization of the underlying stochastic differential equations, leveraging the Hamilton-Jacobi-Bellman (HJB) equation as a hard constraint throughout training. This technique enables scalable and accurate policy/value learning, with empirical evidence illustrating its limitations and comparative performance relative to a related expectation-free formulation in the presence of jumps (Cheridito et al., 21 May 2025).

1. Mathematical Structure: Stochastic Control with Jumps

Continuous-time stochastic control with jumps considers a state process XαX^\alpha on a finite time horizon T>0T>0, controlled by a feedback control αt=α(t,Xtα)ARm\alpha_t = \alpha(t, X_{t-}^\alpha)\in A\subset \mathbb{R}^m. The controlled system evolves according to jump-diffusion dynamics: dXtα=β(t,Xtα,αt)dt+σ(t,Xtα,αt)dWt+Eγ(t,Xtα,z,αt)Nα(dz,dt)dX_t^\alpha = \beta(t, X_t^\alpha, \alpha_t)\,dt + \sigma(t, X_t^\alpha, \alpha_t)\,dW_t + \int_E \gamma(t, X_{t-}^\alpha, z, \alpha_t) N^\alpha(dz,dt) where WW is a kk-dimensional Brownian motion and NαN^\alpha is a controlled Poisson random measure with state-dependent intensity. The control objective is to maximize the expected cumulative reward: V(0,x)=supαE[0Tf(t,Xtα,αt)dt+F(XTα)]V(0,x) = \sup_\alpha \mathbb{E}\left[ \int_0^T f(t, X_t^\alpha, \alpha_t)\,dt + F(X_T^\alpha) \right] or, conditionally,

V(t,x)=supαE[tTf(s,Xsα,αs)ds+F(XTα)Xtα=x].V(t,x) = \sup_\alpha \mathbb{E}\left[ \int_t^T f(s, X_s^\alpha, \alpha_s)\,ds + F(X_T^\alpha) \mid X_t^\alpha = x \right].

This setting captures both diffusive and discontinuous risk sources and encompasses numerous financial, engineering, and resource allocation problems.

2. Dynamic Programming and the HJB Equation in Jump-Diffusions

Under appropriate regularity conditions, dynamic programming leads to the following Hamilton-Jacobi-Bellman integro-PDE: tV(t,x)+supaAH(t,x,V,a)=0,V(T,x)=F(x)\partial_t V(t,x) + \sup_{a\in A} H(t,x,V,a) = 0, \quad V(T, x) = F(x) where the Hamiltonian is

H(t,x,V,a)=f(t,x,a)+βxV+12Tr[σσx2V]+λ(t,x,a)EZZ[V(t,x+γ(t,x,Z,a))V(t,x)].H(t, x, V, a) = f(t, x, a) + \beta^\top \nabla_x V + \frac{1}{2} \mathrm{Tr}[\sigma \sigma^\top \nabla_x^2 V] + \lambda(t, x, a)\mathbb{E}_{Z\sim \mathcal{Z}}[V(t, x + \gamma(t, x, Z, a)) - V(t, x)].

The integro-differential structure arises from the controlled jump component, embedding both continuous and impulsive state transitions.

3. Residual-Based PINN Training Objectives

GPI-PINN 1 approximates the value function by a neural network Vθ(t,x)V_\theta(t,x) and the feedback control by a neural network αϕ(t,x)\alpha_\phi(t,x). To enforce the HJB equation and policy optimality, the following loss functions are used:

  • Value Loss: Enforces the square of the HJB residual and the terminal condition,

L1(θ,ϕ)=E(t,x)μ[H(t,x,θ,ϕ)2]+Exν[Vθ(T,x)F(x)]2\mathscr{L}_1(\theta, \phi) = \mathbb{E}_{(t,x) \sim \mu}\left[ \mathcal{H}(t,x,\theta,\phi)^2 \right] + \mathbb{E}_{x \sim \nu}\left[ V_\theta(T,x) - F(x) \right]^2

where H(t,x,θ,ϕ)=tVθ+H(t,x,Vθ,αϕ(t,x))\mathcal{H}(t, x, \theta, \phi) = \partial_t V_\theta + H(t, x, V_\theta, \alpha_\phi(t, x)).

  • Policy Loss: Maximizes the Hamiltonian under the current value estimate,

L2(θ,ϕ)=E(t,x)μ[H(t,x,θ,ϕ)]\mathscr{L}_2(\theta, \phi) = -\mathbb{E}_{(t, x) \sim \mu}[\mathcal{H}(t, x, \theta, \phi)]

Training alternates between minimizing L1\mathscr{L}_1 (value update) and L2\mathscr{L}_2 (policy update). Monte Carlo sampling is utilized for the Poisson random measure, and all required gradients are computed by backpropagation.

4. Neural Architecture: DGM Block Networks

Both VθV_\theta and αϕ\alpha_\phi employ the DGM (Deep Galerkin Method) network architecture, as introduced by Sirignano & Spiliopoulos (2018). Key architectural features:

  • Input: (t,x)R1+d(t, x) \in \mathbb{R}^{1 + d}
  • Layers: One DGM input layer, followed by LL DGM blocks with NN neurons each. Each block computes gated updates:

Z=σ(),G=σ(),R,H similarlyZ^\ell = \sigma(\ldots), \quad G^\ell = \sigma(\ldots), \quad R^\ell, H^\ell \text{ similarly}

and update via

S+1=(1G)H+ZSS^{\ell+1} = (1-G^\ell)\odot H^\ell + Z^\ell \odot S^\ell

  • Output: Vθ(t,x)=σout(WSL+1+b)V_\theta(t, x) = \sigma_{\mathrm{out}}(W S^{L+1} + b), using Softplus or linear activations as appropriate.
  • Policy network output: Activations chosen to match control set AA (e.g., sigmoid for box constraints; tanh\tanh or linear for unbounded controls).

The policy network matches the value network in structure, differing only in output dimension and activation.

5. Full Training Protocol for GPI-PINN 1

Training is conducted by iteratively alternating between value and policy updates. The process is as follows:

  1. Sample mini-batches (ti,xi)μ(t_i, x_i) \sim \mu, xjνx_j \sim \nu, and ziZz_i \sim \mathcal{Z}.
  2. Value Step: Minimize L1(θ,ϕ)\mathscr{L}_1(\theta, \phi) over θ\theta, keeping ϕ\phi fixed.
  3. Policy Step: Minimize L2(θ,ϕ)\mathscr{L}_2(\theta, \phi) (equivalently, maximize the Hamiltonian) over ϕ\phi, keeping θ\theta fixed.
  4. Repeat until convergence.

Both value and policy losses are computed using Monte Carlo approximations, and all gradient calculations leverage automatic differentiation. No time discretization is imposed on the underlying SDE—the method relies exclusively on enforcing the PDE residual.

6. Theoretical Properties and Regularity

Key theoretical requirements:

  • If vC1,2v \in C^{1,2} solves the HJB and an argmax policy exists, then v=Vv=V and α^\hat{\alpha} is optimal (Bouchard–Touzi Theorem 2.2.4).
  • Proposition 3.1 describes a technique to avoid Hessian terms by employing a univariate second-derivative "trick."
  • Proposition 3.2 demonstrates that VαV^\alpha minimizes a conditional mean-squared error with respect to the surrogate GG functional.
  • Convergence lacks complete proof, but residual-based schemes (per Baird 1995) are typically more stable; the actor-critic iteration admits local linearization arguments.

Assumptions include sufficient regularity (VC1,2V\in C^{1,2}) and integrability/growth conditions to ensure finiteness of PDE residuals and expectations.

7. Empirical Evaluation and Performance

The architecture employs three DGM blocks (L=3L=3) with N=50N=50 neurons each, optimized using Adam with learning rates η1=η2=103\eta_1 = \eta_2 = 10^{-3}, batch sizes M1=M2=216M_1 = M_2 = 2^{16}, and $512$ steps for each update per epoch. Training and sampling use TensorFlow/Keras on an NVIDIA RTX 4090. The main empirical findings:

  • Linear–Quadratic Regulator (LQR) with jumps: GPI-PINN 1 and its expectation-free variant (GPI-PINN 2) achieve sub-percent errors for d=10,50d = 10, 50 without jumps. With jumps, GPI-PINN 1 becomes infeasible, while GPI-PINN 2 remains tractable even for d=50d = 50 and runs in less than an hour. Mean absolute errors in value and policy (MAEV,MAEα\mathrm{MAE}_V, \mathrm{MAE}_\alpha) are on the order of 10310^{-3}10210^{-2}.
  • Optimal consumption-investment: For problems with constant and stochastic coefficients in dd up to $52$, involving $26$-dimensional control, the method demonstrates small residuals L1,20\mathscr{L}_{1,2} \to 0 and produces stable, monotonic policy functions. No closed-form solution is available in these cases.

A plausible implication is that while GPI-PINN 1 is accurate in moderate dimensions and for diffusive systems, its computational cost becomes prohibitive for high-dimensional jump-diffusion problems, at which point GPI-PINN 2 is preferred (Cheridito et al., 21 May 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GPI-PINN 1.