Neural Network in the Loop with MPC

Updated 28 November 2025

Neural Network in the Loop with MPC is a control paradigm where NN models are integrated into the prediction or optimization loop to replace or augment analytical models.
It leverages diverse NN architectures such as WNN, GRU, and MLP to effectively capture nonlinear dynamics and adapt to time-varying changes.
The approach enhances real-time control by improving computational speed, handling uncertainties, and ensuring closed-loop stability with rigorous safety guarantees.

A neural network in the loop with Model Predictive Control (MPC) refers to an integrated control architecture in which a neural network (NN) is embedded within the prediction, constraint, or optimization loop of an MPC scheme. This paradigm enables data-driven, nonlinear, or uncertainty-aware modeling of the plant dynamics, cost, or constraints, often to augment or replace analytical models, accelerate computations, or increase robustness. Implementations vary widely across plant classes, control objectives, learning strategies, and safety guarantees, yet share a core workflow where the NN actively participates in or replaces a portion of the MPC's online computations.

1. Neural Networks for Forward Model Identification in MPC

One primary role for neural networks is as surrogate models for system dynamics inside the MPC prediction loop. In this architecture, at each control step, the NN predicts future plant outputs over the MPC horizon, replacing or augmenting traditional physics-based or linear models. This allows the controller to handle nonlinearities, unmodeled effects, or system changes directly from data.

A canonical example is wavelet neural networks (WNN) with feedforward layers, used for online system identification in the prediction loop of MPC (Khodabandehlou et al., 2018). The WNN receives as input the current and past control and output signals, computes hidden-layer activations via a Mexican-hat wavelet basis, and produces a next-step output via a weighted linear combination plus direct feedforward mapping. The WNN weights are updated in real time: the feedforward component via recursive least squares, the wavelet weights via gradient descent. At each time step, the updated network generates predictions for the MPC horizon by closed-loop forward-propagation, and the MPC solves a finite-horizon quadratic program (QP) with these predictions. This structure enables real-time adaptation to time-varying, nonlinear plants under network-induced sensor and actuator delays, with provable Lyapunov stability of the closed-loop (Khodabandehlou et al., 2018).

Similar structures appear for recurrent NNs, such as incrementally input-to-state stable Gated Recurrent Units (GRUs) (Bonassi et al., 2021) or single-layer recurrent equilibrium networks (REN) (Ravasio et al., 25 Jun 2025). In these cases, a GRU/REN is trained offline (with incremental stability conditions enforced during training) to capture nonlinear, possibly partially observed system dynamics. At each MPC cycle, the NN’s recurrent state is updated, predictions are rolled out over the horizon, and constraints are imposed on the predicted state-input pairs. Offset-free tracking is supported by integral state augmentation and observer corrections. Exponential closed-loop stability is established under suitable GRU weight and observer gain conditions (Bonassi et al., 2021, Ravasio et al., 25 Jun 2025).

For highly complex or high-dimensional systems, fully connected multilayer perceptrons (MLPs) or even deep residual networks (CNNs) can be deployed as residual dynamic models within MPC for embedded high-rate control tasks (e.g., quadrotors) (Salzmann et al., 2022). Here, real-time operation is enabled by local linearization of the NN at each SQP iteration, offloading batch Jacobian computations to a GPU, and constructing an efficiently solvable QP at each control cycle. This approach supports high parametric capacity (up to millions of NN weights) while achieving low-latency, high-bandwidth closed-loop control, with substantial reductions in tracking error compared to classical models.

2. Neural Networks as Explicit MPC Policy or Trajectory Approximators

Another architecture is to "amortize" the expensive online optimization of traditional MPC by replacing the control computation with a pre-trained neural network policy or trajectory generator. This is usually achieved by imitating offline solutions of the MPC problem over a dense set of sampled states.

A standard feedforward MLP is trained to regress the optimal first control move ${u^\star_0(x)}$ (or the entire optimal trajectory ${U^\star(x)}$ ) produced by the MPC for a collection of sampled initial states and reference signals (Kiš et al., 2019, Nubert et al., 2019, Hoffmann et al., 29 Apr 2024, Pal et al., 2023). At run time, the NN simply maps the current state (and scenario parameters) to a control law, which can be saturated to enforce hard input bounds. For instance, in a chemical reactor case, a four-layer, four-neuron MLP achieves state and input trajectories matching exact MPC to within 2% suboptimality in 94.5% of runs at sub-millisecond CPU time (Kiš et al., 2019). In high-dimensional problems (e.g., manipulator tracking), deeper networks are used and explicit error bounds established so that the NN approximation lies within the disturbance margin allowed by tube-based robust MPC, thereby preserving safety and performance guarantees (Nubert et al., 2019).

PlanNetX and similar trajectory nets further extend this idea: instead of cloning only the first move, they learn to predict the entire planned trajectory of the MPC (states and controls) as an open-loop plan (Hoffmann et al., 29 Apr 2024). The network is trained with a state-trajectory roll-out loss, mapped through the known plant dynamics, and can incorporate scenario parameters such as speed limits, lead-vehicle predictions, and time-index encodings. At deployment, only the first control from the planned trajectory is applied, and replanning occurs at each sampling instant. This reduces computation by an order of magnitude compared to repeated online OCP solving.

Hybrid control hierarchies, such as Memory-Augmented MPC (MAMPC), blend explicit neural and analytic control. The control authority is allocated dynamically among a learned NN surrogate (trained to imitate MPC), an LQR module (for locally linear behavior), and fallback to full implicit MPC when the NN is outside of a certified region or violates constraints (Wu et al., 2021). This approach accelerates amortized runtime—by 30–65% in benchmarks—while retaining local stability and satisfaction of safety constraints.

3. Learning-Based Uncertainty Compensation and Robustness

Neural networks are also deployed to infer or cancel model bias, uncertainty, or disturbances inside the MPC loop. Here, the NN acts as an auxiliary module delivering an additive or multiplicative correction to the baseline dynamics, typically in a matched or unmatched configuration. For instance, in tube-based Learning-Based MPC (LBMPC), a feedforward DNN is trained online to approximate unmatched uncertainty $h(x, u)$ (Gasparino et al., 2023). A dual timescale learning scheme is applied: at each sampling instant, only the final layer is adapted via a projection-based update law, while the inner layers are retrained less frequently using a data buffer. The DNN output is inserted directly into the MPC’s predictive model, and robust constraint tightening is used to guarantee recursive feasibility and ISS provided the DNN output remains within prescribed norm bounds (Gasparino et al., 2023).

In adaptive tube-MPC frameworks (Mishra et al., 21 Nov 2025), the NN is given bounded "learning authority," outputting an additive control term $u_t^a ≈ -h(x_t)$ . The sum $u_t = u_t^a + u_t^m$ is applied, with $u_t^m$ computed by tightened MPC. Authority bounds are estimated from experience data, and online projection is used to ensure safety even during learning transients. Output-layer weights are updated in the main loop, while experience selection and hidden-layer training occur asynchronously. Recursive feasibility and robust stability properties are ensured through classical tube-based MPC theory, provided output bounds are maintained (Mishra et al., 21 Nov 2025).

Dropout MPC utilizes an ensemble of neural-network predictors generated via Monte Carlo dropout, yielding a set of sampled dynamic models (Syntakas et al., 4 Jun 2024). Each ensemble member solves a parallel MPC, with their suggested control actions blended according to weighted voting. The ensemble mean and variance provide a principled measure of model uncertainty, which can be used for cautious control (e.g., decelerating when predictive uncertainty grows) and robustification against under-trained or mis-specified oracle dynamics.

4. Neural Networks Embedded in the Cost Function, Constraint, or Value Function

An emerging architecture injects neural networks inside the MPC cost function or value function as flexible, learnable surrogates tailored for improved closed-loop optimality in the presence of model mismatch or unknown objectives.

Feedforward neural networks can be used to augment the stage cost $ℓ(x,u)$ within the MPC OCP. The NN learns a mapping $y_{NN}(x)$ or $ℓ_{NN}(x;\theta)$ , which can be tuned online (e.g., via safe Bayesian optimization) to improve closed-loop performance metrics subject to explicit stability constraints (Hirt et al., 16 Sep 2024, Hirt et al., 18 Apr 2024). Probabilistic safety is enforced by treating the optimal value function as a Lyapunov candidate and constraining the learned NN parameters so that the value is positive definite and strictly decreasing along state trajectories.

Alternative approaches use value-function approximation: a neural network is trained to regress the long-term (infinite-horizon) cost-to-go, with supervised data derived from offline MPC solutions augmented via sensitivity analysis (Orrico et al., 23 Jan 2024). This NN is then embedded as a learned terminal cost in a short-horizon (even myopic, $N=1$ ) MPC, enabling a reduction in per-step computation by an order of magnitude while preserving near-optimal closed-loop behavior—provided the NN error is controlled. The method leverages the fact that if $\mathcal{V}_\theta(x) \approx V^*(x)$ closely, the short-horizon plus NN value-function control is nearly globally optimal.

Sensitivities of the value function to changing parameters (e.g., mass, friction) can also be learned by a secondary NN; at runtime, these sensitivities correct the NN value-function output as model parameters drift, circumventing the need for offline retraining and supporting adaptation/calibration on embedded hardware (Wang et al., 8 Sep 2025). Combined with barrier-function constraints, such as control barrier functions (CBFs) for collision avoidance, these approaches yield real-time acceleration (two orders of magnitude) and rigorous safety margins.

5. Scenario-Based, Bayesian, and Probabilistic Neural Network–MPC Integrations

Probabilistic models, including Bayesian Neural Networks (BNNs) and scenario-based MPC, are employed to quantify and integrate learned uncertainty and safety margins. When an LPV dynamic model is learned from data via a BNN (Bao et al., 2022), the full posterior over model parameters is maintained and used to generate a finite, representative scenario tree. The scenario-based MPC then optimizes actions across all branches, enforcing non-anticipative control and robust constraints. Parameter-dependent terminal sets and controllers are computed via polytopic or LMI methods to ensure recursive feasibility, robust positive invariance, and Lyapunov stability. Explicit confidence margins (safety probabilities $≥1-δ$ ) can be enforced by sampling extreme scenarios from the BNN posterior and encoding appropriate robust or chance constraints within the OCP (Bao et al., 2022).

Dropout-ensemble methods cited above (Syntakas et al., 4 Jun 2024) and safe Bayesian optimization, as discussed for neural cost tuning (Hirt et al., 16 Sep 2024, Hirt et al., 18 Apr 2024), augment the controller with rigorous probabilistic confidence intervals for key closed-loop properties.

6. Sampling, Imitation, and Handling Real-Time Constraints

Various architectural and algorithmic innovations leverage NNs to reduce or eliminate online computational bottlenecks of MPC, critical for embedded and high-frequency actuation scenarios. Explicit NN policies (or trajectory predictors) achieve one to two orders of magnitude speed-up over QP-based MPC (Kiš et al., 2019, Nubert et al., 2019, Hoffmann et al., 29 Apr 2024). In sampling-based MPC (e.g., MPPI or CEM), 3D spatio-temporal CNNs are trained to predict the optimal mean control sequence, eliminating iterative resampling in favor of a single forward pass, and then diversity is preserved by sampling around the NN mean (Pal et al., 2023). This structure enables high-speed operation with near-optimal collision avoidance in dynamic environments, and the CNN implicitly absorbs obstacle motion prediction.

Hybridization, as in MAMPC (Wu et al., 2021), allows blending the NN, LQR, and MPC computation according to region-of-attraction certifications, producing both amortized efficiency and fail-safe stability. In all such frameworks, careful authority partitioning (bounding the NN’s contribution and tightening constraints appropriately) is essential to prevent feasibility loss during learning or adaptation (Mishra et al., 21 Nov 2025).

7. Stability, Safety, and Theoretical Guarantees

The embedding of NNs within the MPC loop raises significant challenges in closed-loop safety, stability, and constraint robustness. The state of the art demonstrates that, under tube-MPC with tightened constraints, projection-bounded NN outputs, and Lyapunov-type analysis (often with stability explicitly enforced during training or via MPC value-function constraints), safety and practical stability can be maintained (Khodabandehlou et al., 2018, Nubert et al., 2019, Bonassi et al., 2021, Gasparino et al., 2023, Wu et al., 2021, Mishra et al., 21 Nov 2025, Hirt et al., 16 Sep 2024, Wang et al., 8 Sep 2025, Bao et al., 2022, Ravasio et al., 25 Jun 2025).

Region-of-attraction, robust positive invariance, and Lyapunov decrease are achieved by proper selection of authority bounds, step sizes, and scenario-tree or BNN-based confidence intervals. When explicit error bounds (e.g., via Hoeffding’s inequality) are available, the tube-robustness margins can be enforced to dominate any NN-induced error (Nubert et al., 2019). Adaptive and dual-timescale learning further support safe real-time operation under model uncertainty and learning transients (Gasparino et al., 2023, Mishra et al., 21 Nov 2025). When NN-parameter learning is incorporated, safe Bayesian optimization with GP-based Lyapunov constraints can enforce stability with specified probabilistic guarantees (Hirt et al., 16 Sep 2024, Hirt et al., 18 Apr 2024).

The table below summarizes key NN-in-the-loop MPC variants and their defining features:

NN Role in MPC	Key Algorithmic Structure	Formal Guarantees
Dynamics prediction	NN predicts (possibly online updated) plant outputs for MPC optimizer; e.g., WNN/GRU/MLP in prediction loop	Lyapunov/δISS-based stability; tube-tightening for robustness
Explicit policy	NN imitates offline MPC over sampled states, used as explicit control law or trajectory planner	Statistical error bounds, tube-MPC robustness with error threshold
Cost/value surrogate	NN replaces/augments cost or terminal value function, learned via regression or BO	Lyapunov positivity, constrained optimization of parameters, probabilistic stability
Uncertainty modeling	NN regresses uncertainty/disturbance, output added (with bounded authority) to system model, adapted online	Input-to-state-stability (ISS), robust constraint satisfaction with tight authority partitioning
Probabilistic/ensemble	MC dropout or BNN produces ensemble model for parallel MPCs; voting or worst-case used	Robust chance-constraint satisfaction, ensemble-based uncertainty management
Hybrid control (MAMPC)	Modes allocate control between LQR, NN, and fallback MPC per forward-verification	Local asymptotic stability on feasibility set; anytime safe switching

References

"Networked Model Predictive Control Using a Wavelet Neural Network" (Khodabandehlou et al., 2018)
"Neural Network Based Explicit MPC for Chemical Reactor Control" (Kiš et al., 2019)
"Nonlinear MPC for Offset-Free Tracking of systems learned by GRU Neural Networks" (Bonassi et al., 2021)
"Real-time Neural-MPC: Deep Learning Model Predictive Control for Quadrotors and Agile Robotic Platforms" (Salzmann et al., 2022)
"Composing MPC with LQR and Neural Network for Amortized Efficiency and Stable Control" (Wu et al., 2021)
"Unmatched uncertainty mitigation through neural network supported model predictive control" (Gasparino et al., 2023)
"Safe and Fast Tracking on a Robot Manipulator: Robust MPC and Neural Network Control" (Nubert et al., 2019)
"A Learning- and Scenario-based MPC Design for Nonlinear Systems in LPV Framework with Safety and Stability Guarantees" (Bao et al., 2022)
"Safe and Stable Closed-Loop Learning for Neural-Network-Supported Model Predictive Control" (Hirt et al., 16 Sep 2024)
"Stability-informed Bayesian Optimization for MPC Cost Function Learning" (Hirt et al., 18 Apr 2024)
"PlanNetX: Learning an Efficient Neural Network Planner from MPC for Longitudinal Control" (Hoffmann et al., 29 Apr 2024)
"Dropout MPC: An Ensemble Neural MPC Approach for Systems with Learned Dynamics" (Syntakas et al., 4 Jun 2024)
"On Building Myopic MPC Policies using Supervised Learning" (Orrico et al., 23 Jan 2024)
"NeuroSMPC: A Neural Network guided Sampling Based MPC for On-Road Autonomous Driving" (Pal et al., 2023)
"Recurrent neural network-based robust control systems with closed-loop regional incremental ISS and application to MPC design" (Ravasio et al., 25 Jun 2025)
"Safety Meets Speed: Accelerated Neural MPC with Safety Guarantees and No Retraining" (Wang et al., 8 Sep 2025)
"Algorithmic design and implementation considerations of deep MPC" (Mishra et al., 21 Nov 2025)