PACMAN: Real-Time ML Prediction and Control
- PACMAN is a paradigm that integrates machine learning predictions with control theory to enable adaptive, real-time decision-making.
- It employs methodologies like adaptive MPC, data-driven predictive control, and RL-guided approaches, achieving notable computational efficiency and performance gains.
- PACMAN architectures incorporate uncertainty management and safety strategies to ensure robust operations in systems ranging from autonomous vehicles to fusion plasma devices.
Prediction And Control using MAchiNe learning (PACMAN) unifies machine learning and control theory in real-time decision-making systems by integrating data-driven prediction modules with optimization-based or learning-based controllers. PACMAN is both a conceptual paradigm and a suite of concrete architectures that exploit ML-based predictions (e.g., of dynamics, events, costs, or uncertainties) to improve control performance, adaptivity, and safety in systems ranging from autonomous vehicles and fusion plasma devices to neuro-inspired controllers and human-in-the-loop RL agents. The field encompasses model-augmented MPC, hybrid learning–control loops, uncertainty-aware regression models in the loop, and meta-control strategies that adapt the degree of reliance on ML advice according to observed confidence and risk.
1. Fundamental PACMAN Paradigm and Definitions
PACMAN architectures are characterized by closed feedback loops that interleave learned prediction and controller components:
- Predictors are ML models that map measured histories and/or control proposals to future states, events, uncertain model parameters, or surrogate costs. These can be regressors, classifiers, or probabilistic models (e.g., GPs, survival nets).
- Controllers are typically optimization-based (e.g., MPC), RL-based (actor-critic), or classical (PID, FSM). They consume ML-predicted quantities in their cost functions, constraints, or policy selection.
- Integration Mechanism governs how, and to what extent, ML predictions are trusted, combined with priors, or used for fallback. Strategies include adaptive blending, threshold-based selection, or direct coupling in iterative optimizers.
This fundamental loop is realized in diverse domains and at multiple levels of system abstraction—from low-level robot agility to real-time plasma regime control, as exemplified in DIII-D’s digital control PCS (Rothstein et al., 11 Nov 2025).
2. Core Methodologies and Representative Algorithms
Regression-based Adaptive MPC
Fast Adaptive Regression-based Model Predictive Control (ARMPC) (Mostafa et al., 2022) constructs support-vector regression (SVR) models to predict, in real time, the smallest MPC horizon length and sampling count that guarantee near-optimal performance. Offline, an extensive synthetic dataset is generated by enumerating reference trajectories and evaluating, for each, the minimal horizon that keeps cost degradation under tolerance. Features for SVR include curvature, wavelet transforms, and current tracking error. At runtime, these regressors enable ARMPC to shrink the MPC computational burden by 35–65% with negligible loss of performance, benchmarking favorably against SMPC, PMPC, dual and NN-based adaptives.
Model-and-Data-Driven Predictive Control (DMPC)
DMPC (Jafarzadeh et al., 2021) fuses a known model-driven MPC cost with a “black-box” ML-predicted component (e.g., rough terrain penalty, event risk) via iterated trajectory optimization. At each receding-horizon step, the MPC solves a short-horizon problem using the known cost, queries the ML predictor for the unknown cost along candidate trajectories, and selects actions to minimize the sum. This iterative loop, “rollout → prediction → terminal-cost update → MPC solve,” guarantees monotonic cost improvement, recursive feasibility, and sample efficiency, often converging in ≪10 outer loops versus thousands for model-free RL or GP-based alternatives.
Stochastic Hybrid and Residual-Model Predictive Control
Approaches such as (D'Souza et al., 25 Aug 2025) address environments with mode-varying or piecewise residual dynamics using Gaussian Process (GP) regression for each mode. The pipeline learns a soft classifier for the mode distribution and updates this online via a confidence-weighted Bayesian scheme with kernel-density estimation. High-complexity MINLP MPCs are replaced by tractable NLP relaxations (e.g., endogenous/exogenous-residual CNLP), leveraging offline mode mapping and uncertainty-set “shrinking.” These designs enable 4–18% improved task costs with 250× speedup relative to MINLPs and robust chance-constraint satisfaction.
RL-Guided Adaptive Control
RL meta-controllers adaptively tune MPC hyperparameters such as prediction horizon (Bøhn et al., 2021), or serve as actor–critic value and policy oracles that warm start optimization cycles (Reiter et al., 6 Jun 2024). In such settings, the RL policy outputs action proposals (e.g., horizon length or initial trajectory), while the value network calibrates terminal or long-horizon costs. Parallel MPC architectures, such as AC4MPC, evaluate both actor-initialized and shift-based solutions, selecting the trajectory with lowest critic-estimated cost at run time and guaranteeing cost never exceeds the actor’s alone plus a decaying error term.
Human-in-the-Loop and Symbolic-ML Hybrid PACMAN
Planner–Actor–Critic PACMAN architectures (Lyu et al., 2019, Lyu et al., 2019) integrate symbolic planners encoded in answer-set programming (ASP) with RL learners and human feedback. The symbolic planner leverages a stochastic policy sample to generate feasible, goal-directed plans; the actor–critic module then executes these, updating with environmental rewards or direct human advantage signals. This approach yields rapid jump-start, low-variance convergence, and robustness to misleading or infrequent feedback, all while ensuring logical constraints and goal conditions.
3. Uncertainty Management and Safety Strategies
A central theme in PACMAN frameworks is the explicit estimation and management of uncertainty in ML predictions. For safety-critical domains (e.g., vehicle guidance), random forest regression ensembles provide per-sample variance estimates (Fogla et al., 2023), triggering safe fallback (PID or human override) when the predicted confidence dips below a calibrated threshold. This mechanism prevents ML-induced failures and ensures robust deployment on out-of-distribution scenarios. In LAC (Li, 19 Jul 2025), a meta-learner adaptively blends ML predictions and nominal fallback actions by optimizing a confidence parameter , with formal competitive ratio guarantees under adversarial errors.
4. System Architectures and Real-World Deployments
In large-scale, heterogeneous systems (e.g., DIII-D fusion PCS (Rothstein et al., 11 Nov 2025)), PACMAN controllers are structured as modular blocks:
- Input Block: Receives high-frequency diagnostics and system states.
- Model Block: Hosts independent ML predictors, e.g., NNs, reservoir computing, or survival models.
- Controller Block: Implements RL, MPC, FSM, or PID logic, reading from models but not intercommunicating for safety.
- Output Block: Arbitrates actuator requests, enforces safety constraints, and interfaces with hardware.
This design paradigm allows co-existence of RL-based profile controllers, event-predictor FSMs, proportional AE suppressors, and MPC profile trackers, all operating within tight hard-real-time requirements (e.g., 2–50 ms). Engineering guidance includes cycle-time budgeting, hard error-checking, block independence, and rapid fallback policies.
5. Empirical Performance and Benchmarking
Empirical evaluations across domains demonstrate PACMAN’s advantages:
- ARMPC yields 35–65% reduction in solve time versus fixed-horizon/time-grid MPCs with parity in closed-loop cost (Mostafa et al., 2022).
- Stochastic hybrid MPCs with GP-mode mapping demonstrate 4–18% mean cost reduction and up to 250× improved computation (D'Souza et al., 25 Aug 2025).
- RL meta-control of horizon and critic-guided MPC yield 4–8% better control performance versus fixed/MPC-only schemes (Bøhn et al., 2021, Reiter et al., 6 Jun 2024).
- Symbolic–RL–human frameworks achieve faster learning, minimal variance, and robustness to adversarial feedback compared to pure RL with reward shaping (Lyu et al., 2019, Lyu et al., 2019).
- In DIII-D, diversified ML control modules realize tracking error within ±5%, >90% event mitigation, and sub-3% RMS profile errors at real-time cycle rates (Rothstein et al., 11 Nov 2025).
6. Limitations, Open Problems, and Future Directions
Challenges include the cost of offline data generation and regressor training for ARMPC (Mostafa et al., 2022), the lack of formal closed-loop stability guarantees in RL–MPC meta-control (Bøhn et al., 2021), and the need for reliable failure detection and conflict resolution in multi-controller environments (Rothstein et al., 11 Nov 2025). Extensions under active paper comprise:
- Deep learning regressors for improved feature extraction and transfer in adaptive MPCs.
- Online adaptation, continual learning for changing target tasks or plant drift.
- Hardware acceleration (GPU/FPGA) to reduce ML inference and optimization latency.
- Compositional approaches enabling multi-objective hierarchical ML–MPC/FSM controllers.
- Standardized arbitration and blending among competing model-generated actuator requests.
PACMAN’s principle—systematically coupling learned prediction with interpretability-, constraint-, or safety-aware control—continues to motivate new tractable designs for decision-making under uncertainty (Mostafa et al., 2022, D'Souza et al., 25 Aug 2025, Rothstein et al., 11 Nov 2025).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free