Real-Time Neural MPC

Updated 27 May 2026

Real-Time Neural MPC is a control strategy that leverages neural networks to replace or accelerate components within MPC for rapid decision making.
It includes both policy-surrogate and model-surrogate approaches, utilizing architectures like Transformers, MLPs, CNNs, and operator networks.
Empirical benchmarks show its effectiveness across robotics, autonomous driving, and industrial control with significant speed-ups in computation.

Real-time neural model predictive control (neural MPC) refers to methods that use neural networks as surrogates for some or all components of the model predictive control pipeline, enabling fast, parallelizable computation of control policies, typically with strict wall-clock time constraints. Neural MPC schemes achieve real-time operation by either amortizing or accelerating the computational bottlenecks of standard MPC, such as online optimization, dynamic model evaluation, or policy selection, leveraging advances in neural architectures, training paradigms, and numerical optimization routines. This article surveys the principal architectures, training methodologies, inference pipelines, theoretical guarantees, and empirical benchmarks that currently characterize state-of-the-art real-time neural MPC.

1. Neural Surrogates for MPC: Categories and Core Architectures

Two principal classes of real-time neural MPC algorithms can be distinguished: (i) “policy surrogate” approaches, where a neural network directly outputs the entire control sequence or first control in place of online optimization, and (ii) “model surrogate” approaches, in which the neural network replaces all or part of the predictive dynamics model used within the online MPC optimization loop.

Policy-surrogate neural MPC methods—exemplified by encoder-only Transformer architectures (e.g., TransMPC) and explicit MLP-based policies—parameterize $\pi_\theta(x_t, X^R)$ as a function mapping the current state and future reference to a control sequence, enabling entire-horizon inference in a single forward pass. For example, TransMPC realizes an explicit $\pi_\theta$ via a Transformer with bidirectional self-attention over $N+1$ tokens (state and reference sequence), yielding actions in parallel and supporting variable horizon length without retraining (Wu et al., 9 Sep 2025). MLPs and RNNs have also been used as surrogates for explicit MPC, especially for lower-dimensional systems (Kumar, 2022).

Model-surrogate neural MPC embeds a learned neural network model—often an MLP, convolutional net (CNN), or more recently operator network (DeepONet, MS-DeepONet)—within a standard or robust MPC optimization, replacing the physics-based process model. This surrogate is typically trained on real or simulated process data to minimize a one-step (or multi-step) prediction loss, and is then differentiated, relaxed, or approximated as needed for real-time optimization (Salzmann et al., 2022, Chen et al., 10 Jan 2025, Jong et al., 23 May 2025).

Hybrid schemes explicitly combine policy surrogates with fallback mechanisms (e.g., LQR, conventional MPC) for safety and stability, often defining multi-region laws according to regions of attraction or certification tests (Wu et al., 2021). Further, neural MPC can be integrated as a value-function terminal cost approximation (Wang et al., 8 Sep 2025), as adaptively meta-trained residuals (Mei et al., 23 Apr 2025), or as an oracle for unmatched uncertainty in learning-based MPC (LBMPC) (Gasparino et al., 2023).

2. Training Paradigms: Direct Policy Optimization and Imitation Learning

Several modes of policy network training arise across the literature:

Direct differentiation of the finite-horizon cost: TransMPC (Wu et al., 9 Sep 2025) and the neural co-state regulator (Lian et al., 16 Jul 2025) optimize the finite-horizon cost $J(\theta) = \mathbb{E}_{x,N}[V(x,X^R,N;\theta)]$ or its unsupervised Pontryagin-Hamiltonian analog by automatic differentiation, backpropagating through the system dynamics and entire control rollout.
Imitation learning and dataset aggregation (DAgger): For explicit policy surrogates, especially MLP-based, supervised regression is performed to fit the neural policy to an MPC expert over large state-action datasets. DAgger-style aggregation mitigates distributional shift by iteratively labeling on-policy rollouts with expert actions and retraining (Kumar, 2022). Data sizes of up to $1.4\times10^6$ state-action pairs have been reported.
Meta-learning for adaptation: Fast adaptive neural MPC integrates meta-learned residual models via model-agnostic meta-learning (MAML), producing policies that rapidly fine-tune to new dynamics with a handful of online gradient steps (Mei et al., 23 Apr 2025).
Oracle-based uncertainty learning: Dual-timescale adaptation, as in neural LBMPC, combines fast online adaptation of a final linear NN layer (output weights) with slower offline retraining of the hidden layers from a replay buffer (Gasparino et al., 2023), maintaining MPC feasibility and provable stability.
Value function learning and sensitivity augmentation: Neural value functions are trained by VF-DAGGER (a variant of dataset aggregation for values), with an auxiliary neural network trained to output value sensitivities for online adaptation to parameter variation (Wang et al., 8 Sep 2025).

The loss functions vary according to the surrogate’s role but commonly include horizon-wise MSE (imitation of control sequences or output trajectories), quantile or median losses for robust forecasting (Chen et al., 10 Jan 2025), L1 regularization of terminal co-states (Lian et al., 16 Jul 2025), and constraint-awareness via exact or soft penalty terms (Tabas et al., 2022, Chen et al., 10 Jan 2025).

3. Neural Policy and Model Architectures for Real-Time Performance

The architectural landscape supporting real-time neural MPC is broad, with suitability driven by the specific control task, horizon length, and real-time constraints:

Encoder-only Transformers (TransMPC): $L=6$ self-attention layers, $h=4$ heads/layer, $d_\mathrm{emb}=256$ , and linear output projection $D_u$ enable sequence-wide inference, positional encodings guarantee variable-horizon handling (Wu et al., 9 Sep 2025). Inference time is $<$ 0.3 ms for $\pi_\theta$ 0 on NVIDIA 3060 hardware.
Spatio-temporal CNNs (NeuroSMPC): Modified MobileNet-V2 3D-CNNs process bird’s-eye-view history grids and future global path, outputting $\pi_\theta$ 1-step mean control sequences, reducing typical sample-based MPC planning times by $\pi_\theta$ 2 orders of magnitude at $\pi_\theta$ 3 Hz (Pal et al., 2023).
Feedforward MLPs (Explicit Surrogates, Residuals): Sizes range from two-layer $\pi_\theta$ 4– $\pi_\theta$ 5– $\pi_\theta$ 6– $\pi_\theta$ 7 architectures for simple policy surrogates (Kumar, 2022) to $\pi_\theta$ 8-layer (shrinking-width) nets for robot manipulation (Nubert et al., 2019).
Operator networks (MS-DeepONet): DeepONet/ MS-DeepONet achieves universal multi-step sequence prediction with a single forward pass, drastically reducing the number of online model evaluations and enabling real-time realization for systems up to tens of seconds per horizon ( $\pi_\theta$ 98 ms evals for v.d.Pol/quad-tank benchmarks) (Jong et al., 23 May 2025).
Convexity-constrained nets (ICNN/PICNN): Input-convex architectures with weight non-negativity and ReLU activation enforce provable convexity across multi-step rollouts, guaranteeing efficient and correct optimization inside the MPC loop for building control applications (Bünning et al., 2020).
LSTM/FC DNN process models: Real-world NMPC in resource-constrained hardware (e.g., ARM Cortex A72) is realized via DNNs with six FC layers and one LSTM layer, achieving 1.4 ms per optimization cycle for combustion control (Gordon et al., 2023).

4. Real-Time Optimization Pipelines and Hardware Benchmarks

Algorithmic and systems-level strategies for maintaining real-time execution include:

Single-pass inference: Policy surrogates (e.g., TransMPC, MLP explicit surrogates, MS-DeepONet) produce full action sequences in parallel, making compute time nearly independent of horizon length (Wu et al., 9 Sep 2025, Jong et al., 23 May 2025).
Batch parallelization and local quadratic approximations: On-the-fly integration of residual neural models within RTI-based SQP, leveraging GPU-accelerated differentiation (PyTorch/TensorFlow) for Jacobian/Hessian evaluation (Salzmann et al., 2022).
Mixed-integer programming (MIP) and linear relaxation (LR): Exact ReLU NN MPC can be posed as MILP (scaling poorly with neuron count/horizon), while enhanced LR with additional penalties attains near-exact accuracy in tens of ms (Lan, 2024).
Embedded hardware and computation budgets: On NVIDIA Xavier/Orin, ARM Cortex A72, and Jetson Nano, real-time neural MPC is achieved at sampling rates up to 100 Hz, with net evaluation + RTI occupying $N+1$ 012 ms for fully integrated perception–planning–control pipelines (Jacquet et al., 2024, Gordon et al., 2023, Wang et al., 8 Sep 2025). BAN-MPC demonstrates 200× speed-up vs. classic MPC on Jetson Nano (Wang et al., 8 Sep 2025).

Tables summarizing typical latency for key workloads:

Scheme	Platform	Timing per Cycle
TransMPC	NVIDIA 3060	0.25–0.27 ms (N=20)
NeuroSMPC	i7+RTX A4000	~40 ms (incl. sampling)
MS-DeepONet	i7 CPU	7.8 ms (v.d.Pol), 163 ms (qtank), 3.87 s (cart-pend)
Real-time DNN NMPC	ARM Cortex A72	1.4 ms (avg), <2 ms (worst)
Robust AMPC (NN)	i7 CPU	<1.2 ms (eval), 40 ms cycle

5. Theoretical Guarantees: Stability, Safety, and Constraint Satisfaction

Guarantees for closed-loop performance, stability, and constraint satisfaction fall into several categories:

Explicit constraint encoding: Direct constraint mappings (e.g., gauge mappings) enforce feasible set membership of actions at the output of the policy network, ensuring safety by construction (Tabas et al., 2022).
Robust tube-based MPC: AMPC with robust setpoint tracking absorbs both process and neural approximation errors within a precomputed tube, maintaining constraint satisfaction and practical exponential stability, statistically validated over sampled closed-loop trajectories (Nubert et al., 2019).
Barrier-integrated controllers (CBFs): Neural value function approximations are integrated with CBF-MPC, using a learned value function as a terminal cost and CBF constraints for domain invariance and safety, supported by performance and stability theorems under approximation errors (Wang et al., 8 Sep 2025).
Lyapunov-based region decomposition: Multi-mode hybrid MPC (LQR, NN, full MPC) guarantees local and global stability, with NN used where certification via forward simulation and region-of-attraction reasoning is possible, and MPC as fallback elsewhere (Wu et al., 2021).

Limitations: Many data-driven methods (e.g., NeuroSMPC, behavioral cloning surrogates) lack formal safety and stability guarantees, relying instead on extensive offline validation, hard cost penalties, or hybridization with model-based/planning components (Pal et al., 2023, Kumar, 2022).

6. Empirical Benchmarks and Application Domains

The performance of real-time neural MPC has been validated across a diverse array of control systems:

Land and aerial vehicles: Longitudinal/lateral tracking of nonlinear bicycle models, high-speed quadrotor trajectory following (>10 Hz, >10 m/s), and agile obstacle-avoidance for UAVs at 100 Hz (Wu et al., 9 Sep 2025, Salzmann et al., 2022, Jacquet et al., 2024, Pal et al., 2023).
Autonomous driving: Real-time sample-based MPC in dynamic urban scenes, with 3D-CNN surrogates enabling collision avoidance and safe trajectory generation at human-reaction timescales (Pal et al., 2023).
Industrial/robotic manipulators: Tube-based robust and approximate NN-MPC on 7-DOF robot arms, demonstrating constraint satisfaction, tube-invariant tracking, and under-1 ms policy inference (Nubert et al., 2019).
Manufacturing and process control: Multi-step neural surrogates (TiDE, DeepONet) for melt pool control in additive manufacturing, as well as climate control in buildings via ICNN-PICNN, achieve stringent regulation with solver times well below actuation intervals (Chen et al., 10 Jan 2025, Bünning et al., 2020).
Nonlinear oscillator and benchmark systems: Van der Pol, cart-pole, quadruple tank, and pendulum-on-cart benchmarks provide tractable scenarios for ablations on accuracy, solve time, and generalization properties (Mei et al., 23 Apr 2025, Jong et al., 23 May 2025).

Typical application metrics include root mean square error (RMSE) in state/output tracking, input smoothness, domain safety rates, constraint violation counts, per-cycle wall time, and hardware resource utilization.

7. Open Problems and Future Directions

Notwithstanding demonstrated gains, several technical challenges and opportunities define the frontier of real-time neural MPC:

Scalability to high-dimensional and safety-critical domains: Current certificate-backed approaches often rely on restrictive model classes, and high-dimensional MPC surrogates remain difficult to guarantee (Wu et al., 9 Sep 2025, Nubert et al., 2019).
Adaptive horizon and architecture selection: Variable-horizon adaptation without retraining (as in TransMPC) is an open frontier, as is hyperparameter auto-tuning for operator-net surrogates (Wu et al., 9 Sep 2025, Jong et al., 23 May 2025).
Online adaptation and robustness to distributional shift: Meta-learning, dual-timescale adaptation, and sensitivity-augmented value functions provide promising directions for robustification; yet, principled and efficient schemes for continual learning in-the-loop remain an ongoing research focus (Mei et al., 23 Apr 2025, Gasparino et al., 2023, Wang et al., 8 Sep 2025).
Uncertainty quantification, exploration, and certification: Integrating uncertainty-aware sampling, scalable adaptive constraint schemes, and formal reachability analysis into neural MPC architectures is needed for high-assurance deployment (Pal et al., 2023).
Integration with onboard perception: End-to-end systems deploying DNN-based perception (e.g., depth, occupancy grids) within the real-time MPC pipeline have demonstrated feasibility, but efficient cross-hardware deployment and robust sim2real transfer need further work (Jacquet et al., 2024, Pal et al., 2023).

In summary, real-time neural MPC is a rapidly maturing paradigm, now enabling sub-millisecond inference, variable-horizon control, and adaptive correction in the loop on a wide range of robotic and industrial platforms, with ongoing developments toward certified safety, generalization, and large-scale deployment (Wu et al., 9 Sep 2025, Pal et al., 2023, Nubert et al., 2019, Wang et al., 8 Sep 2025, Mei et al., 23 Apr 2025, Jacquet et al., 2024).