Papers
Topics
Authors
Recent
Search
2000 character limit reached

Physics-Informed Controller Distillation

Updated 6 March 2026
  • Physics-Informed Controller Distillation is a method that embeds the Hamilton–Jacobi–Bellman equation into neural network training to distill optimal feedback controllers for stochastic systems.
  • It employs both decoupled and coupled neural architectures to approximate value and control functions, using physics-informed loss functions computed over simulated Brownian trajectories.
  • The approach robustly generalizes across varying initial conditions and noise intensities, demonstrating computational efficiency and improved performance in high-dimensional control benchmark problems.

Physics-Informed Controller Distillation is an approach for synthesizing optimal feedback controllers for high-dimensional stochastic control problems by leveraging physics-informed neural networks (PINNs) trained via the Hamilton–Jacobi–Bellman (HJB) partial differential equation. The method integrates system dynamics and optimality principles directly into the architecture and loss function of deep neural networks, thereby distilling both the physical laws and optimal control objectives into the network parameters. This framework bypasses the requirement for supervised control labels, relying solely on the terminal cost and a simulator for state dynamics to learn controllers that robustly generalize across initial conditions and stochastic trajectories (Jiao et al., 2024).

1. Hamilton–Jacobi–Bellman Formulation of Stochastic Optimal Control

The foundation of Physics-Informed Controller Distillation is the stochastic HJB equation. For a controlled SDE specified by drift bb, diffusion σ\sigma, running cost ϕ\phi, and terminal cost ψ\psi, the value function V(t,x)V(t,x) is

V(t,x)  =  minu()E[tTϕ(s,xs,u(s))ds+ψ(xT)  |  xt=x]V(t,x)\;=\;\min_{u(\cdot)}\mathbb{E}\left[\int_t^T \phi(s,x_s,u(s))\,ds + \psi(x_T) \;\middle|\; x_t=x\right]

which satisfies the backward parabolic HJB PDE: {tV(t,x)+minuU{b(t,x,u)xV+12tr(σσT(t,x,u)x2V)+ϕ(t,x,u)}=0, V(T,x)=ψ(x)\begin{cases} \partial_t V(t,x) + \min_{u\in U} \left\{ b(t,x,u)\cdot\nabla_x V + \frac12 \operatorname{tr}(\sigma\sigma^T(t,x,u)\nabla^2_x V) + \phi(t,x,u) \right\} = 0, \ V(T,x) = \psi(x) \end{cases} This equation encapsulates Bellman’s optimality principle and encodes the physics and control objectives governing the system.

2. Neural Parameterization of Value and Control Functions

The PINN framework introduces two strategies for neural approximation of VV and optimal control uu:

  • Decoupled (Two-Network) Architecture: Independent networks parameterize VNN(tn,xtn;θV)V(tn,xtn)V^\mathrm{NN}(t_n, x_{t_n}; \theta_V) \approx V(t_n, x_{t_n}) and uNN(tn,xtn;θu)u(tn)u^\mathrm{NN}(t_n, x_{t_n}; \theta_u) \approx u(t_n), typically via fully connected (FC) nets (4 layers, 64 units, tanh\tanh activation) or LSTM networks (3 layers, 50 units, tanh\tanh gates). Both receive (tn,xtn)(t_n, x_{t_n}) as input; the value net outputs a scalar, the control net an mm-dimensional vector.
  • Coupled (One-Network) Architecture: A single network parameterizes VNNV^\mathrm{NN}, and the control is recovered by the closed-form expression u=R~1GTxVNNu = -\widetilde{R}^{-1}G^T \nabla_x V^\mathrm{NN} when such an explicit formula exists. Here, automatic differentiation provides xVNN\nabla_x V^\mathrm{NN}.

Both architectures are trained jointly with a shared loss enforcing the HJB equation along sampled trajectories.

3. Physics-Informed Loss and Pathwise Distillation

The loss function is constructed by evaluating the pathwise HJB residual: H(V,x,u)=tV+b(t,x,u)xV+12tr(σσT(t,x,u)x2V)+ϕ(t,x,u)\mathcal{H}(V, x, u) = \partial_t V + b(t, x, u)\cdot\nabla_x V + \frac12 \operatorname{tr}(\sigma \sigma^T(t, x, u) \nabla_x^2 V) + \phi(t, x, u) over mini-batches of simulated SDE trajectories {xtn(i)}n=0N\{x_{t_n}^{(i)}\}_{n=0}^N. The training objective is

L(θV,θu)=1Mi=1M{n=0N1H(VNN(tn,xtn(i);θV),xtn(i),uNN(tn,xtn(i);θu))2+VNN(T,xT(i);θV)ψ(xT(i))2}L(\theta_V, \theta_u) = \frac{1}{M}\sum_{i=1}^M \left\{ \sum_{n=0}^{N-1} |\mathcal{H}(V^{\rm NN}(t_n, x_{t_n}^{(i)}; \theta_V), x_{t_n}^{(i)}, u^{\rm NN}(t_n, x_{t_n}^{(i)}; \theta_u))|^2 + |V^{\rm NN}(T, x_T^{(i)}; \theta_V) - \psi(x_T^{(i)})|^2 \right\}

where the first term enforces the HJB PDE along the trajectory, and the second enforces the terminal condition. In the one-net case, the explicit control formula replaces the control network outputs.

No supervised labels for value or control are used; the loss alone encodes both the system’s physics and the control optimality condition. This penalization “distills” the required structure into the neural weights.

4. Stochastic Trajectory Sampling and Training Procedure

Training proceeds via stochastic gradient descent over mini-batches of discretized Brownian trajectories. Each iteration involves:

  • Sampling MM independent Brownian-motion paths {Wtn(i)}\{W_{t_n}^{(i)}\}
  • Propagating states via Euler–Maruyama:

xtn+1(i)=xtn(i)+b(tn,xtn(i),u(i)(tn))Δt+σ(tn,xtn(i),u(i)(tn))ΔWn(i)x_{t_{n+1}}^{(i)} = x_{t_n}^{(i)} + b(t_n, x_{t_n}^{(i)}, u^{(i)}(t_n)) \Delta t + \sigma(t_n, x_{t_n}^{(i)}, u^{(i)}(t_n)) \Delta W_n^{(i)}

  • Evaluating the physics-informed residual and terminal condition loss along each path
  • Aggregating losses and applying Adam or SGD with learning rates typically η=103\eta=10^{-3}, for K2×104K\approx 2\times 10^4 iterations, batch size M=50M=50, and time steps N{20,40,50}N\in\{20,40,50\}.

Automatic differentiation is used to compute all required derivatives of VNNV^{\rm NN} for the loss, and Xavier uniform initialization is used for neural weights.

5. Numerical Performance and Benchmark Problems

The method has been empirically validated on five representative stochastic optimal control problems:

Problem Dimensionality Successful Architectures Key Observations
Nonlinear 1D SDE 1 FC-2Net, LSTM-2Net Both architectures converge, low state variance
Linear-Quadratic (n=2,4,30) up to 30 FC, LSTM LSTM robust up to n=30n=30; FC fails for n=30n=30
Stochastic Pendulum 2 FC-1Net, LSTM-1Net Both achieve swing-up, low outcome variance
Nonlinear Cart–Pole 4 LSTM-1Net FC-1Net fails; LSTM-1Net achieves swing-up
Planar Quadcopter 6 LSTM-1Net Only LSTM-1Net converges at this dimension

Metrics reported include mean terminal costs, tracking errors, control profiles, and value functions. LSTM shows >40%>40\% lower training time for low-dimensional problems and is uniquely effective in higher dimensions.

6. Generalization, Theoretical Guarantees, and "Physics Distillation"

The PINN-based controller, once trained, generalizes to unseen initial states and new Brownian trajectories without re-training. Empirical evaluation demonstrates robust stability under moderate variations in noise intensity and time horizon. Theorem 4.1 establishes that the total error is bounded by contributions from (i) time-discretization Δt\Delta t, (ii) the universal-approximation gap of the neural network, and (iii) the SGD optimization error.

The penalization of the HJB operator at randomly sampled time-state points forces the network to encode both the drift and diffusion physics and Bellman’s principle within its weights. The resulting "distilled" controller has no explicit dependency on ground-truth VV or uu labels, requiring only the system simulator and terminal cost specification. This approach demonstrates that PINN frameworks can effectively distill the essential structure of stochastic control problems into finite-dimensional neural parameters (Jiao et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Physics-Informed Controller Distillation.