Deep Koopman-Based EMPC
- Deep Koopman-EMPC is a control paradigm that lifts nonlinear dynamics via learned Koopman operators into a linear latent space for efficient optimization.
- It replaces complex nonlinear models with deep learning-based linear approximations, enabling robust economic MPC through convex quadratic programming.
- The approach employs reinforcement learning refinement and differentiable optimization to reduce constraint violations and operational costs in industrial benchmarks.
Deep Koopman-Based Economic Model Predictive Control (EMPC) is an advanced control paradigm that synthesizes deep learning-based approximations of the Koopman operator with convex economic model predictive control, often refined through reinforcement learning. The methodology replaces nonlinear process models with lifted linear surrogates that retain the expressive power necessary to capture complex system behaviors while enabling tractable optimization via real-time quadratic programming. This framework is designed to provide high-fidelity economic optimization with strong closed-loop constraint satisfaction, even for highly nonlinear, high-dimensional, or partially observable processes.
1. Koopman Operator Theory and Deep Lifting for Control
The Koopman operator provides a linear (though infinite-dimensional) reformulation of nonlinear dynamical systems by describing the evolution of observables rather than states. In practice, deep Koopman-based approaches construct a finite-dimensional linear approximation by learning a nonlinear encoder that lifts system states into a higher-dimensional latent space . The system dynamics in the latent (Koopman) space are modeled as
where are trainable Koopman matrices and is the control input. For practical output prediction, a decoder reconstructs system states via
Both encoder and decoder are typically parameterized as multi-layer perceptrons (MLPs) with tanh or ELU activations; latent dimension is selected to maximize model expressiveness while maintaining real-time feasibility, with values ranging from for CSTRs (Mayfrank et al., 21 Mar 2024, Mayfrank et al., 24 Mar 2025) to for industrial systems (Han et al., 21 May 2024, Mayfrank et al., 6 Nov 2025, Valábek et al., 6 Nov 2025).
The full deep Koopman surrogate identification loss function balances reconstruction, lifting, and state-prediction terms: where
Advanced variants incorporate input-output encoders, time-varying operator parameters via deep networks, and history-dependent lifting via LSTMs to address high-dimensional and partially observed systems (Han et al., 9 Apr 2025).
2. Economic Model Predictive Control Formulation in Lifted Space
EMPC seeks to minimize economic costs over a receding horizon, accommodating nonlinear process constraints and real-world operational objectives. In the deep Koopman-based context, the receding-horizon OCP is formulated with lifted dynamics: with denoting an economic stage cost (e.g., energy, material loss), and a terminal cost such as a quadratic penalty on deviation from steady state.
For partially observed or output-based systems, the entire EMPC OCP—including input, output, and economic cost quadratic decoders—remains a convex quadratic program in the lifted variables, supporting high-dimensional problems in real time (Han et al., 21 May 2024, Han et al., 9 Apr 2025).
Slack variables are systematically incorporated to soften hard constraints and ensure feasibility under disturbances or modeling error: with corresponding bounds on control, output, and slack variables.
3. End-to-End RL Refinement and Differentiable Optimization
A defining feature of recent deep Koopman-EMPC frameworks is the end-to-end refinement of surrogate and control policy via reinforcement learning algorithms such as Proximal Policy Optimization (PPO) or Short-Horizon Actor-Critic (SHAC) (Dony, 12 May 2025, Mayfrank et al., 21 Mar 2024, Mayfrank et al., 24 Mar 2025, Mayfrank et al., 6 Nov 2025). The controller is reinterpreted as a differentiable policy,
where is the first control of the EMPC solution, and all model and policy parameters are co-optimized with respect to the closed-loop reward signal (e.g., economic savings minus constraint violation penalties). Differentiable convex programming tools such as cvxpylayers or OptNet are utilized to allow gradient backpropagation through the EMPC layer, which is critical for joint model-controller learning.
PPO updates are based on the clipped surrogate loss,
where is the policy probability ratio, the estimated advantage, and typically $0.1$–$0.2$.
Hybrid model-based RL schemes (Editor’s term) employ batch policy optimization in both real and simulated (physics-informed) environments to further improve sample efficiency and accelerate closed-loop performance (Mayfrank et al., 24 Mar 2025).
4. Benchmark Case Studies and Quantitative Performance
Deep Koopman-EMPC has been benchmarked on a broad class of nonlinear processes including continuous stirred tank reactors (CSTRs) (Dony, 12 May 2025, Mayfrank et al., 21 Mar 2024, Mayfrank et al., 24 Mar 2025), large-scale air separation units (ASU) (Mayfrank et al., 6 Nov 2025), pasteurization units (Valábek et al., 6 Nov 2025), water treatment plants (Han et al., 21 May 2024), and shipboard carbon capture systems (Han et al., 9 Apr 2025). Case studies consistently demonstrate the following:
- Superior constraint handling: RL-refined Koopman controllers eliminate or dramatically reduce constraint violations versus MLP and system ID-only surrogates. For example, Koopman-RL achieves constraint violation rates of compared to for neural or system ID policies (Dony, 12 May 2025, Mayfrank et al., 21 Mar 2024, Mayfrank et al., 6 Nov 2025).
- Improved economic cost: EMPC with RL-tuned Koopman surrogates yields lower or comparable operating cost; for instance, a cost reduction and steady-state energy savings in a pasteurization unit vs. subspace identification (Valábek et al., 6 Nov 2025), and $1$– cost improvement for shipboard PCC process vs. PI and RL policies (Han et al., 9 Apr 2025).
- Sample efficiency and convergence: Physics-informed Koopman-EMPC converges in $200$–$500$ real steps, compared to $2000+$ for pure data-driven MLP controllers (Mayfrank et al., 24 Mar 2025).
- Real-time feasibility: With latent dimensions –$60$ and convex quadratic programs, solve times are tens to hundreds of milliseconds per step, even for large state spaces (Han et al., 21 May 2024, Han et al., 9 Apr 2025, Mayfrank et al., 6 Nov 2025).
| Controller | Economic cost | Constraint viol. [%] | Avg. compute [s] |
|---|---|---|---|
| Deep Koopman-EMPC (RL) | 0.90–0.94× | 0.4 | 0.03–0.7 |
| System ID Koopman-EMPC | 0.90–0.92× | 8.8–36.2 | 0.03–0.7 |
| Black-box MLP (PPO) | 0.88–0.96× | 6–16 | 0.03–0.7 |
Table: Representative closed-loop benchmark results from CSTR, WWTP, ASU, and shipboard PCC studies (Mayfrank et al., 21 Mar 2024, Dony, 12 May 2025, Han et al., 21 May 2024, Mayfrank et al., 24 Mar 2025, Mayfrank et al., 6 Nov 2025, Han et al., 9 Apr 2025).
5. Practical Implementation and Design Guidelines
Model architecture:
- Encoders and decoders: 3–5 layer MLPs, 64–256 units, tanh or ELU.
- Latent dimension: –$60$ (trade-off: accuracy vs. speed).
- Koopman matrices : learned directly, sometimes structured for controllability.
Optimization and learning:
- SI pretraining: Collect random and OCP-generated data, minimize .
- RL refinement: PPO or SHAC, value loss weight $0.5$, entropy $0.01$.
EMPC hyperparameters:
- Horizon: –$36$ (3–9 h typical).
- Cost coefficients: Empirically set based on economics and violation priorities.
- Slack penalties: or larger to discourage infeasibility.
- QP solvers: OSQP, GUROBI, or cvxpylayers for differentiability.
Deployment:
- Closed-loop: At each step, encode current state, solve MPC/QP, apply optimal .
- Real-time constraints are satisfied with sub-second computation for dimensions up to (Han et al., 9 Apr 2025).
Robustness:
- Slack variables guarantee feasibility under disturbances.
- Kalman filtering or moving-horizon estimation can be used to address partial observability.
6. Extensions, Scalability, and Limitations
Deep Koopman-EMPC is extensible to:
- Large-scale systems and partially observed processes via output-based lifting and history-dependent encoders (Han et al., 21 May 2024, Han et al., 9 Apr 2025, Mayfrank et al., 6 Nov 2025).
- Uncertainty and robustness: Integrating robust or stochastic Koopman-model predictive control.
- Physics-informed modeling: Using PINNs or hybrid models to constrain surrogates and increase data efficiency (Mayfrank et al., 24 Mar 2025).
- Online adaptation: Continual fine-tuning via differentiable solvers.
Limitations include the need for careful model selection to avoid overfitting to simulator dynamics when deploying on real-world plants, especially when measurements are limited or dynamic regimes shift rapidly (Mayfrank et al., 6 Nov 2025). Future directions cited include real-plant validation and model-based RL integration.
7. Comparative Analysis and Synthesis
Across multiple studies, deep Koopman-based EMPC demonstrates a recurring theme: embedding nonlinear system dynamics into a well-structured lifted space, optimized both for predictive accuracy and for the economic control task. This yields convex OCPs amenable to real-time solutions and enables end-to-end RL refinement that outperforms both classical subspace-identification and generic neural approaches in constraint satisfaction and operational cost. Differentiable optimization layers (e.g., cvxpylayers) are now standard for enabling joint training of encoders, Koopman dynamics, cost decoders, and MPC policies.
Practical guidelines stress two-stage architectures (nonlinear decoder for identification, linear decoder for control), high-fidelity step-response training data, explicit economic cost encoding, and slack-augmented feasibility. The resulting controllers have been deployed and benchmarked in challenging domains such as wastewater treatment (Han et al., 21 May 2024), pasteurization (Valábek et al., 6 Nov 2025), air separation (Mayfrank et al., 6 Nov 2025), and energy-intensive chemical reactors (Mayfrank et al., 24 Mar 2025), achieving consistently strong performance metrics in both economic and constraint objectives.