Physics-Informed Controller Distillation

Updated 6 March 2026

Physics-Informed Controller Distillation is a method that embeds the Hamilton–Jacobi–Bellman equation into neural network training to distill optimal feedback controllers for stochastic systems.
It employs both decoupled and coupled neural architectures to approximate value and control functions, using physics-informed loss functions computed over simulated Brownian trajectories.
The approach robustly generalizes across varying initial conditions and noise intensities, demonstrating computational efficiency and improved performance in high-dimensional control benchmark problems.

Physics-Informed Controller Distillation is an approach for synthesizing optimal feedback controllers for high-dimensional stochastic control problems by leveraging physics-informed neural networks (PINNs) trained via the Hamilton–Jacobi–Bellman (HJB) partial differential equation. The method integrates system dynamics and optimality principles directly into the architecture and loss function of deep neural networks, thereby distilling both the physical laws and optimal control objectives into the network parameters. This framework bypasses the requirement for supervised control labels, relying solely on the terminal cost and a simulator for state dynamics to learn controllers that robustly generalize across initial conditions and stochastic trajectories (Jiao et al., 2024).

1. Hamilton–Jacobi–Bellman Formulation of Stochastic Optimal Control

The foundation of Physics-Informed Controller Distillation is the stochastic HJB equation. For a controlled SDE specified by drift $b$ , diffusion $\sigma$ , running cost $\phi$ , and terminal cost $\psi$ , the value function $V(t,x)$ is

$V(t,x)\;=\;\min_{u(\cdot)}\mathbb{E}\left[\int_t^T \phi(s,x_s,u(s))\,ds + \psi(x_T) \;\middle|\; x_t=x\right]$

which satisfies the backward parabolic HJB PDE: $\begin{cases} \partial_t V(t,x) + \min_{u\in U} \left\{ b(t,x,u)\cdot\nabla_x V + \frac12 \operatorname{tr}(\sigma\sigma^T(t,x,u)\nabla^2_x V) + \phi(t,x,u) \right\} = 0, \ V(T,x) = \psi(x) \end{cases}$ This equation encapsulates Bellman’s optimality principle and encodes the physics and control objectives governing the system.

2. Neural Parameterization of Value and Control Functions

The PINN framework introduces two strategies for neural approximation of $V$ and optimal control $u$ :

Decoupled (Two-Network) Architecture: Independent networks parameterize $V^\mathrm{NN}(t_n, x_{t_n}; \theta_V) \approx V(t_n, x_{t_n})$ and $u^\mathrm{NN}(t_n, x_{t_n}; \theta_u) \approx u(t_n)$ , typically via fully connected (FC) nets (4 layers, 64 units, $\tanh$ activation) or LSTM networks (3 layers, 50 units, $\tanh$ gates). Both receive $(t_n, x_{t_n})$ as input; the value net outputs a scalar, the control net an $m$ -dimensional vector.
Coupled (One-Network) Architecture: A single network parameterizes $V^\mathrm{NN}$ , and the control is recovered by the closed-form expression $u = -\widetilde{R}^{-1}G^T \nabla_x V^\mathrm{NN}$ when such an explicit formula exists. Here, automatic differentiation provides $\nabla_x V^\mathrm{NN}$ .

Both architectures are trained jointly with a shared loss enforcing the HJB equation along sampled trajectories.

3. Physics-Informed Loss and Pathwise Distillation

The loss function is constructed by evaluating the pathwise HJB residual: $\mathcal{H}(V, x, u) = \partial_t V + b(t, x, u)\cdot\nabla_x V + \frac12 \operatorname{tr}(\sigma \sigma^T(t, x, u) \nabla_x^2 V) + \phi(t, x, u)$ over mini-batches of simulated SDE trajectories $\{x_{t_n}^{(i)}\}_{n=0}^N$ . The training objective is

$L(\theta_V, \theta_u) = \frac{1}{M}\sum_{i=1}^M \left\{ \sum_{n=0}^{N-1} |\mathcal{H}(V^{\rm NN}(t_n, x_{t_n}^{(i)}; \theta_V), x_{t_n}^{(i)}, u^{\rm NN}(t_n, x_{t_n}^{(i)}; \theta_u))|^2 + |V^{\rm NN}(T, x_T^{(i)}; \theta_V) - \psi(x_T^{(i)})|^2 \right\}$

where the first term enforces the HJB PDE along the trajectory, and the second enforces the terminal condition. In the one-net case, the explicit control formula replaces the control network outputs.

No supervised labels for value or control are used; the loss alone encodes both the system’s physics and the control optimality condition. This penalization “distills” the required structure into the neural weights.

4. Stochastic Trajectory Sampling and Training Procedure

Training proceeds via stochastic gradient descent over mini-batches of discretized Brownian trajectories. Each iteration involves:

Sampling $M$ independent Brownian-motion paths $\{W_{t_n}^{(i)}\}$
Propagating states via Euler–Maruyama:

$x_{t_{n+1}}^{(i)} = x_{t_n}^{(i)} + b(t_n, x_{t_n}^{(i)}, u^{(i)}(t_n)) \Delta t + \sigma(t_n, x_{t_n}^{(i)}, u^{(i)}(t_n)) \Delta W_n^{(i)}$

Evaluating the physics-informed residual and terminal condition loss along each path
Aggregating losses and applying Adam or SGD with learning rates typically $\eta=10^{-3}$ , for $K\approx 2\times 10^4$ iterations, batch size $M=50$ , and time steps $N\in\{20,40,50\}$ .

Automatic differentiation is used to compute all required derivatives of $V^{\rm NN}$ for the loss, and Xavier uniform initialization is used for neural weights.

5. Numerical Performance and Benchmark Problems

The method has been empirically validated on five representative stochastic optimal control problems:

Problem	Dimensionality	Successful Architectures	Key Observations
Nonlinear 1D SDE	1	FC-2Net, LSTM-2Net	Both architectures converge, low state variance
Linear-Quadratic (n=2,4,30)	up to 30	FC, LSTM	LSTM robust up to $n=30$ ; FC fails for $n=30$
Stochastic Pendulum	2	FC-1Net, LSTM-1Net	Both achieve swing-up, low outcome variance
Nonlinear Cart–Pole	4	LSTM-1Net	FC-1Net fails; LSTM-1Net achieves swing-up
Planar Quadcopter	6	LSTM-1Net	Only LSTM-1Net converges at this dimension

Metrics reported include mean terminal costs, tracking errors, control profiles, and value functions. LSTM shows $>40\%$ lower training time for low-dimensional problems and is uniquely effective in higher dimensions.

6. Generalization, Theoretical Guarantees, and "Physics Distillation"

The PINN-based controller, once trained, generalizes to unseen initial states and new Brownian trajectories without re-training. Empirical evaluation demonstrates robust stability under moderate variations in noise intensity and time horizon. Theorem 4.1 establishes that the total error is bounded by contributions from (i) time-discretization $\Delta t$ , (ii) the universal-approximation gap of the neural network, and (iii) the SGD optimization error.

The penalization of the HJB operator at randomly sampled time-state points forces the network to encode both the drift and diffusion physics and Bellman’s principle within its weights. The resulting "distilled" controller has no explicit dependency on ground-truth $V$ or $u$ labels, requiring only the system simulator and terminal cost specification. This approach demonstrates that PINN frameworks can effectively distill the essential structure of stochastic control problems into finite-dimensional neural parameters (Jiao et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

Solving a class of stochastic optimal control problems by physics-informed neural networks (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Physics-Informed Controller Distillation.

Physics-Informed Controller Distillation

1. Hamilton–Jacobi–Bellman Formulation of Stochastic Optimal Control

2. Neural Parameterization of Value and Control Functions

3. Physics-Informed Loss and Pathwise Distillation

4. Stochastic Trajectory Sampling and Training Procedure

5. Numerical Performance and Benchmark Problems

6. Generalization, Theoretical Guarantees, and "Physics Distillation"

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Physics-Informed Controller Distillation

1. Hamilton–Jacobi–Bellman Formulation of Stochastic Optimal Control

2. Neural Parameterization of Value and Control Functions

3. Physics-Informed Loss and Pathwise Distillation

4. Stochastic Trajectory Sampling and Training Procedure

5. Numerical Performance and Benchmark Problems

6. Generalization, Theoretical Guarantees, and "Physics Distillation"

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research