Physics-Informed Controller Distillation
- Physics-Informed Controller Distillation is a method that embeds the Hamilton–Jacobi–Bellman equation into neural network training to distill optimal feedback controllers for stochastic systems.
- It employs both decoupled and coupled neural architectures to approximate value and control functions, using physics-informed loss functions computed over simulated Brownian trajectories.
- The approach robustly generalizes across varying initial conditions and noise intensities, demonstrating computational efficiency and improved performance in high-dimensional control benchmark problems.
Physics-Informed Controller Distillation is an approach for synthesizing optimal feedback controllers for high-dimensional stochastic control problems by leveraging physics-informed neural networks (PINNs) trained via the Hamilton–Jacobi–Bellman (HJB) partial differential equation. The method integrates system dynamics and optimality principles directly into the architecture and loss function of deep neural networks, thereby distilling both the physical laws and optimal control objectives into the network parameters. This framework bypasses the requirement for supervised control labels, relying solely on the terminal cost and a simulator for state dynamics to learn controllers that robustly generalize across initial conditions and stochastic trajectories (Jiao et al., 2024).
1. Hamilton–Jacobi–Bellman Formulation of Stochastic Optimal Control
The foundation of Physics-Informed Controller Distillation is the stochastic HJB equation. For a controlled SDE specified by drift , diffusion , running cost , and terminal cost , the value function is
which satisfies the backward parabolic HJB PDE: This equation encapsulates Bellman’s optimality principle and encodes the physics and control objectives governing the system.
2. Neural Parameterization of Value and Control Functions
The PINN framework introduces two strategies for neural approximation of and optimal control :
- Decoupled (Two-Network) Architecture: Independent networks parameterize and , typically via fully connected (FC) nets (4 layers, 64 units, activation) or LSTM networks (3 layers, 50 units, gates). Both receive as input; the value net outputs a scalar, the control net an -dimensional vector.
- Coupled (One-Network) Architecture: A single network parameterizes , and the control is recovered by the closed-form expression when such an explicit formula exists. Here, automatic differentiation provides .
Both architectures are trained jointly with a shared loss enforcing the HJB equation along sampled trajectories.
3. Physics-Informed Loss and Pathwise Distillation
The loss function is constructed by evaluating the pathwise HJB residual: over mini-batches of simulated SDE trajectories . The training objective is
where the first term enforces the HJB PDE along the trajectory, and the second enforces the terminal condition. In the one-net case, the explicit control formula replaces the control network outputs.
No supervised labels for value or control are used; the loss alone encodes both the system’s physics and the control optimality condition. This penalization “distills” the required structure into the neural weights.
4. Stochastic Trajectory Sampling and Training Procedure
Training proceeds via stochastic gradient descent over mini-batches of discretized Brownian trajectories. Each iteration involves:
- Sampling independent Brownian-motion paths
- Propagating states via Euler–Maruyama:
- Evaluating the physics-informed residual and terminal condition loss along each path
- Aggregating losses and applying Adam or SGD with learning rates typically , for iterations, batch size , and time steps .
Automatic differentiation is used to compute all required derivatives of for the loss, and Xavier uniform initialization is used for neural weights.
5. Numerical Performance and Benchmark Problems
The method has been empirically validated on five representative stochastic optimal control problems:
| Problem | Dimensionality | Successful Architectures | Key Observations |
|---|---|---|---|
| Nonlinear 1D SDE | 1 | FC-2Net, LSTM-2Net | Both architectures converge, low state variance |
| Linear-Quadratic (n=2,4,30) | up to 30 | FC, LSTM | LSTM robust up to ; FC fails for |
| Stochastic Pendulum | 2 | FC-1Net, LSTM-1Net | Both achieve swing-up, low outcome variance |
| Nonlinear Cart–Pole | 4 | LSTM-1Net | FC-1Net fails; LSTM-1Net achieves swing-up |
| Planar Quadcopter | 6 | LSTM-1Net | Only LSTM-1Net converges at this dimension |
Metrics reported include mean terminal costs, tracking errors, control profiles, and value functions. LSTM shows lower training time for low-dimensional problems and is uniquely effective in higher dimensions.
6. Generalization, Theoretical Guarantees, and "Physics Distillation"
The PINN-based controller, once trained, generalizes to unseen initial states and new Brownian trajectories without re-training. Empirical evaluation demonstrates robust stability under moderate variations in noise intensity and time horizon. Theorem 4.1 establishes that the total error is bounded by contributions from (i) time-discretization , (ii) the universal-approximation gap of the neural network, and (iii) the SGD optimization error.
The penalization of the HJB operator at randomly sampled time-state points forces the network to encode both the drift and diffusion physics and Bellman’s principle within its weights. The resulting "distilled" controller has no explicit dependency on ground-truth or labels, requiring only the system simulator and terminal cost specification. This approach demonstrates that PINN frameworks can effectively distill the essential structure of stochastic control problems into finite-dimensional neural parameters (Jiao et al., 2024).