GPI-PINN 1: Neural Solver for Jump-Diffusions
- The paper demonstrates that GPI-PINN 1 enforces the HJB equation in high-dimensional jump-diffusion settings and achieves sub-percent errors on LQR benchmarks.
- This approach leverages residual-based loss functions with Monte Carlo sampling to train DGM block networks without discretizing time.
- Empirical evaluations reveal its limitations for jump-diffusions, suggesting expectation-free formulations may be preferable in high-dimensional scenarios.
The GPI-PINN 1 method is a residual-based physics-informed neural network approach for solving high-dimensional, finite-horizon, continuous-time stochastic control problems characterized by jump-diffusion dynamics. In this framework, neural networks are trained to approximate both the value function and the optimal control policy with no requirement for time discretization of the underlying stochastic differential equations, leveraging the Hamilton-Jacobi-Bellman (HJB) equation as a hard constraint throughout training. This technique enables scalable and accurate policy/value learning, with empirical evidence illustrating its limitations and comparative performance relative to a related expectation-free formulation in the presence of jumps (Cheridito et al., 21 May 2025).
1. Mathematical Structure: Stochastic Control with Jumps
Continuous-time stochastic control with jumps considers a state process on a finite time horizon , controlled by a feedback control . The controlled system evolves according to jump-diffusion dynamics: where is a -dimensional Brownian motion and is a controlled Poisson random measure with state-dependent intensity. The control objective is to maximize the expected cumulative reward: or, conditionally,
This setting captures both diffusive and discontinuous risk sources and encompasses numerous financial, engineering, and resource allocation problems.
2. Dynamic Programming and the HJB Equation in Jump-Diffusions
Under appropriate regularity conditions, dynamic programming leads to the following Hamilton-Jacobi-Bellman integro-PDE: where the Hamiltonian is
The integro-differential structure arises from the controlled jump component, embedding both continuous and impulsive state transitions.
3. Residual-Based PINN Training Objectives
GPI-PINN 1 approximates the value function by a neural network and the feedback control by a neural network . To enforce the HJB equation and policy optimality, the following loss functions are used:
- Value Loss: Enforces the square of the HJB residual and the terminal condition,
where .
- Policy Loss: Maximizes the Hamiltonian under the current value estimate,
Training alternates between minimizing (value update) and (policy update). Monte Carlo sampling is utilized for the Poisson random measure, and all required gradients are computed by backpropagation.
4. Neural Architecture: DGM Block Networks
Both and employ the DGM (Deep Galerkin Method) network architecture, as introduced by Sirignano & Spiliopoulos (2018). Key architectural features:
- Input:
- Layers: One DGM input layer, followed by DGM blocks with neurons each. Each block computes gated updates:
and update via
- Output: , using Softplus or linear activations as appropriate.
- Policy network output: Activations chosen to match control set (e.g., sigmoid for box constraints; or linear for unbounded controls).
The policy network matches the value network in structure, differing only in output dimension and activation.
5. Full Training Protocol for GPI-PINN 1
Training is conducted by iteratively alternating between value and policy updates. The process is as follows:
- Sample mini-batches , , and .
- Value Step: Minimize over , keeping fixed.
- Policy Step: Minimize (equivalently, maximize the Hamiltonian) over , keeping fixed.
- Repeat until convergence.
Both value and policy losses are computed using Monte Carlo approximations, and all gradient calculations leverage automatic differentiation. No time discretization is imposed on the underlying SDE—the method relies exclusively on enforcing the PDE residual.
6. Theoretical Properties and Regularity
Key theoretical requirements:
- If solves the HJB and an argmax policy exists, then and is optimal (Bouchard–Touzi Theorem 2.2.4).
- Proposition 3.1 describes a technique to avoid Hessian terms by employing a univariate second-derivative "trick."
- Proposition 3.2 demonstrates that minimizes a conditional mean-squared error with respect to the surrogate functional.
- Convergence lacks complete proof, but residual-based schemes (per Baird 1995) are typically more stable; the actor-critic iteration admits local linearization arguments.
Assumptions include sufficient regularity () and integrability/growth conditions to ensure finiteness of PDE residuals and expectations.
7. Empirical Evaluation and Performance
The architecture employs three DGM blocks () with neurons each, optimized using Adam with learning rates , batch sizes , and $512$ steps for each update per epoch. Training and sampling use TensorFlow/Keras on an NVIDIA RTX 4090. The main empirical findings:
- Linear–Quadratic Regulator (LQR) with jumps: GPI-PINN 1 and its expectation-free variant (GPI-PINN 2) achieve sub-percent errors for without jumps. With jumps, GPI-PINN 1 becomes infeasible, while GPI-PINN 2 remains tractable even for and runs in less than an hour. Mean absolute errors in value and policy () are on the order of –.
- Optimal consumption-investment: For problems with constant and stochastic coefficients in up to $52$, involving $26$-dimensional control, the method demonstrates small residuals and produces stable, monotonic policy functions. No closed-form solution is available in these cases.
A plausible implication is that while GPI-PINN 1 is accurate in moderate dimensions and for diffusive systems, its computational cost becomes prohibitive for high-dimensional jump-diffusion problems, at which point GPI-PINN 2 is preferred (Cheridito et al., 21 May 2025).