Language Model Predictive Control (LMPC)

Updated 25 March 2026

Language Model Predictive Control (LMPC) is a framework that integrates large language models with traditional MPC to generate and evaluate candidate action plans.
It employs LLMs to propose candidate trajectories and interpret contextual disturbances while using classical cost-function optimization for trajectory selection.
Empirical applications in robotics, code synthesis, and process control demonstrate LMPC’s potential to improve adaptability, performance, and sample efficiency.

LLM Predictive Control (LMPC) is a class of control and planning frameworks that pairs LLMs with Model Predictive Control (MPC) paradigms, leveraging the generative and contextual reasoning capabilities of LLMs to either propose candidate actions/plans, interpret unstructured context for planning, or synthesize predictions in otherwise classical MPC loops. LMPC encompasses methodologies where LLMs serve as implicit planners, proposal generators, cost evaluators, or context-to-disturbance translators, tightly integrating machine learning-based LLMs with the receding-horizon optimization methodology of MPC. LMPC is distinct both from standard LLM-based prompting for reasoning (e.g., Chain-of-Thought) and from traditional MPC in its explicit or implicit use of language-derived, context-cognizant decisions and predictions (Maher, 5 Jan 2025, Wu et al., 5 Dec 2025, Liang et al., 2024, Rasheed et al., 1 Nov 2025).

1. Foundational Principles and Variants of LMPC

LMPC frameworks fundamentally reinterpret structured prompting, context usage, and plan proposal in LLMs through the lens of receding-horizon optimal control. Key variants identified in the literature include:

LLM-as-Planner (LLMPC): The LLM samples candidate action (control) trajectories over a finite horizon, which are then scored and selected via an explicit cost and (possibly approximate) dynamics model as in standard MPC. The LLM thus acts as a generative proposal mechanism, with language-model-induced bias over the candidate trajectory space. Cost minimization is performed outside the LLM (Maher, 5 Jan 2025).
Context-Aware MPC via LLMs (InstructMPC, IMPC): LLMs receive unstructured, human-provided context (e.g., maintenance instructions, weather events) as input and, via prompt engineering and neural architecture modules, predict future disturbance trajectories for the MPC optimization. The key component is the L2D (Language-to-Distribution) or CDP (Contextual Disturbance Predictor) module, which is fine-tuned in closed loop using realized costs, with performance guarantees for appropriately regularized updates (Wu et al., 5 Dec 2025, Wu et al., 8 Apr 2025).
LMPC for Human-Robot Interaction: LLMs are fine-tuned to model the transition dynamics and feedback in human-in-the-loop task instruction (modeled as a POMDP). LMPC solves the problem of optimizing code or action sequences (e.g., robot code blocks) to minimize the expected number of corrections before a user signals success. The LLM acts as an autoregressive simulator of possible dialogue/action trajectories, evaluated under receding-horizon criteria (Liang et al., 2024).

These variants can be considered as occupying a methodological spectrum from "LLM-in-the-loop planning" to "LLM-augmented prediction" and "learning MPC with LLM-contextualized value functions".

2. Formal Modeling and LMPC Algorithms

The canonical LMPC architecture adapts the standard finite-horizon MPC problem:

$\min_{u_{t:t+H-1}} J(x_t, u_{t:t+H-1}) = \sum_{k=0}^{H-1} c(x_{t+k}, u_{t+k}) + c_T(x_{t+H})$

$\mathrm{s.t.}\ x_{t+k+1} = f(x_{t+k}, u_{t+k}),\quad u_{t+k}\in U, x_{t+k}\in X.$

The LMPC variants instantiate this as follows:

Variant	LLM Role	MPC Component	Output Integrated
LLMPC (Maher, 5 Jan 2025)	Proposal generator (action plans)	Cost/dynamics external to LLM	Best proposal selected
InstructMPC (Wu et al., 5 Dec 2025)	Contextual disturbance forecaster	Standard MPC/Regret minimizer	Predicted disturbances in MPC
Human-Robot (Liang et al., 2024)	Transition dynamics model	Action search in context/rollout	Codes/action optimization
LLM-for-Control (Rasheed et al., 1 Nov 2025)	Direct action selector or proposal/cost estimator	Rec. horizon, sometimes external cost	Control instant or plan

A typical LLMPC loop involves, for each time step:

Constructing a prompt or context representation from the system state and history.
Querying the LLM (either to sample candidate action sequences, predict disturbance trajectories, or simulate future rollouts).
Mapping LLM outputs to system actions/plans.
For multi-proposal methods, evaluating each trajectory using explicit dynamics/cost, then selecting the optimal plan.
Executing the first action in the selected plan and observing the new state.
Updating the context and, if applicable, further fine-tuning or learning.

When the LLM proposes actions/plans directly (as in (Maher, 5 Jan 2025)), its outputs are scored using a classical MPC cost functional, and the best plan is selected and applied in a receding-horizon fashion. For context-as-prediction (InstructMPC), LLM-based modules serve as surrogates for exogenous disturbance predictors (Wu et al., 5 Dec 2025, Wu et al., 8 Apr 2025).

3. Theoretical Guarantees and Regularization

LMPC methods can inherit several properties from classic (learning) MPC and, in specialized cases, provide formal regret bounds.

Closed-loop Regret: In InstructMPC (Wu et al., 5 Dec 2025, Wu et al., 8 Apr 2025), with affine or linear LLM-based predictors and quadratic cost, fine-tuning the context-to-disturbance prediction module via a control-aware loss yields a cumulative regret (difference between realized and optimal cost) of $O(\sqrt{T \log T})$ over $T$ steps under stabilizability and boundedness assumptions.
Recursive Feasibility and Stability: Classical LMPC (without LLMs) ensures that the controller is recursively feasible and closed-loop stable when terminal constraints and terminal costs are constructed from a library (sampled safe set and cost-to-go) of past successful trajectories (Rosolia et al., 2017). Similar safe-set arguments can be adapted for systems where the LLM's output is regulated to ensure trajectory feasibility.
Iterative Performance Improvement: In sample-based LMPC, the cost across iterations is guaranteed to be non-increasing, often converging to the global or mode-wise optimum if sufficient exploration and mode reweighting (e.g., via a bandit meta-controller in MM-LMPC) is performed (Hashimoto et al., 1 Oct 2025).

A practical limitation in LLMPC is the reliance on external cost and dynamics models. If these are inaccurate, closed-loop performance may degrade. In context-aware predictors, the quality and representativeness of the scenario library or LLM pre-training corpus are critical.

4. Empirical Results and Application Domains

LMPC frameworks, in various instantiations, have been empirically validated in numerous domains:

Benchmark Control Tasks: In spring–mass systems, LLMPC achieves trajectory tracking with costs only 10–20% higher than exact MPC, demonstrating feasibility as an approximate planner (Maher, 5 Jan 2025).
Code Generation: Planning-based prompting via LLMPC (look-ahead search) yields higher-quality, feature-rich outputs in code synthesis tasks (e.g., complete HTML/CSS/JavaScript games) compared to one-shot prompt baselines (Maher, 5 Jan 2025).
Human-in-the-Loop Robotics: LMPC outperforms base LLMs and retrieval-augmented generation in robot learning from feedback, with up to 26.9% higher task success rate and reduced required corrections (Liang et al., 2024).
Process Control and Power Grids: InstructMPC demonstrates improved adaptation to human instructions and non-stationary grid events, with empirical cost reduction and adherence to operator context (Wu et al., 5 Dec 2025).
Greenhouse and Building Environments: LLM-driven controllers incorporating explicit simulations or tool-assisted evaluations offer robust performance and adaptability in experimental cyber-physical systems (Rasheed et al., 1 Nov 2025, Wu et al., 8 Apr 2025).

The following table highlights core empirical results for selected domains.

Application	LMPC Variant	Improvement Over Baseline
Spring–Mass Tracking (Maher, 5 Jan 2025)	LLMPC	Tracks MPC with 10–20% cost overhead
Code Generation (Maher, 5 Jan 2025)	LLMPC (planning)	Richer features vs. ReAct baseline
Robot Teaching (Liang et al., 2024)	LMPC-Rollouts	+26.9% task success, –0.5 avg. corrections
Power Grid Control (Wu et al., 5 Dec 2025)	InstructMPC	Lower realized control cost in disturbance events
Greenhouse Control (Rasheed et al., 1 Nov 2025)	LLM-in-the-loop	Comparable to LSTM/QP MPC; <0.3 °C MAE

5. Sample Efficiency, Limitations, and Trade-offs

Sample efficiency is a recurrent theme in LMPC:

Rollout Requirements: LMPC with sampling-based candidate generation requires $K \geq 5$ –8 full candidate trajectories per time step, imposing nontrivial computational and latency constraints at inference (Maher, 5 Jan 2025, Liang et al., 2024).
Prompt and Horizon Design: Shorter horizons keep prompts within LLM context limits but may miss long-term planning structure; longer horizons increase search space and decoding time (Maher, 5 Jan 2025, Liang et al., 2024).
Explicit/Implicit Biases: LLM-based policies may encode a "language cost" preference, restricting proposals to regions of high probability under the model, necessitating explicit proposal diversity and cost-based re-ranking mechanisms (Maher, 5 Jan 2025).
Assumptions on Dynamics and Cost: LLMPC assumes access to (or learnability of) a state-transition model and a well-defined cost function; generalization to non-differentiable or ill-specified objectives is an open problem (Maher, 5 Jan 2025, Wu et al., 5 Dec 2025).

Major limitations include the challenges of black-box tasks (where explicit system models are unavailable), prohibitive sampling costs for large action spaces, sensitivity to prompt engineering, and the reliance on LLM proposals to cover all relevant state-action trajectories.

6. Future Directions and Open Challenges

Several avenues are proposed for advancing LMPC:

Learning Surrogate Models: Integrate learning of both cost-evaluator and transition models directly from data, improving the sample efficiency and robustness of the explicit cost evaluation step (Maher, 5 Jan 2025).
Adaptive Planning and Warm-Starting: Develop adaptive horizon methods and warm-starting of LLM proposals to reduce repeated recomputation (Maher, 5 Jan 2025).
Combinatorial Search Integration: Employ search methods such as beam search or Monte Carlo Tree Search in candidate proposal and evaluation, combining classical and LLM-based planning (Maher, 5 Jan 2025).
Formal Guarantees with Nonlinear/Uncertain Dynamics: Extend closed-loop regret and convergence guarantees beyond linear dynamics and quadratic cost regimes to encompass broader classes of systems and objectives (Wu et al., 5 Dec 2025, Wu et al., 8 Apr 2025).
Real-World Deployment: Migrate LMPC architectures to real-world robotics, dialogue management, multi-agent coordination, and safety-critical cyber-physical infrastructure (Maher, 5 Jan 2025, Liang et al., 2024, Wu et al., 5 Dec 2025).

The intersection of LLM-driven generative modeling with MPC opens rich possibilities for intelligent, context-sensitive, sample-efficient planning and control under complex, unstructured, and evolving operational regimes. The ongoing development of LMPC seeks to harmonize the strengths of explicit MPC optimization with the generalization, reasoning, and abstraction capabilities of LLMs, balancing theoretical guarantees with empirical adaptability.