InstructMPC: LLM-Enhanced Predictive Control

Updated 8 December 2025

InstructMPC is a framework that integrates human semantic inputs via large language models with model predictive control for enhanced disturbance forecasting.
It employs a dual feedback loop architecture that fuses contextual data processing with online fine-tuning to optimize control actions.
The system achieves robust performance for linear systems with a theoretical regret bound of O(√(T log T)), ensuring control efficiency in dynamic settings.

InstructMPC is a human–LLM–in-the-loop framework for context-aware Model Predictive Control (MPC), developed to address the limitations of traditional MPC in domains where semantic, operator-supplied, or otherwise unstructured contextual information is crucial for disturbance forecasting. It augments classic MPC architectures by integrating LLMs to interpret high-level natural language instructions and encode them into quantitative disturbance forecasts, which are then optimally fused into the control loop. InstructMPC combines closed-loop learning with rigorous theoretical guarantees, achieving an $O(\sqrt{T\log T})$ regret for linear systems under a tailored, decision-focused loss optimized by methods such as Direct Preference Optimization (DPO) (Wu et al., 8 Apr 2025, Wu et al., 5 Dec 2025).

1. Motivation and Limitations of Traditional MPC

Conventional finite-horizon MPC methods rely on mechanistic or statistical models to generate future disturbance forecasts $\{\hat w_{t|t},\ldots,\hat w_{\mathcal{T}|t}\}$ for decision making. These methods typically do not provide any mechanism for incorporating high-level, semantic, or unstructured operator instructions—such as upcoming scheduled events, emergency interventions, textual descriptions of weather or system state, or expert “gut instincts”. In domains such as building energy management, autonomous robotics, or power system operation, the disturbances often have a strong semantic or context-driven component not captured by conventional time-series models, leading to suboptimal control policies if this information is ignored (Wu et al., 8 Apr 2025, Wu et al., 5 Dec 2025).

The inability to adapt dynamically to changes in environment, context, or operator preference further constrains the utility of standard MPC frameworks. Additionally, direct closed-loop fine-tuning of neural predictors within the control loop is non-trivial; since the control cost depends on realized disturbances observable only after actions are taken, naïve gradient-based adaptations generally fail to align prediction objectives with actual control performance.

2. System Architecture and Data Flow

InstructMPC introduces a dual feedback loop architecture that tightly couples semantic context, disturbance prediction, and control optimization. The primary system modules are:

Contextual Data Collection: At each time step $t$ , the system observes plant state $x_t\in\mathbb{R}^n$ and receives context $c_t$ , which is a free-form natural language instruction (e.g., “start backup generator at 2 pm,” “expect sudden wind surge,” or “maintenance on substation B tonight”).
LLM-Based Module: The context $c_t$ (or a sequence $c_{t:\mathcal{T}|t}$ for multi-step prediction) is processed via a prompting template through a pretrained LLM (e.g., LLaMA-8B or similar backbones). The LLM produces a probability distribution $p(s|c_t)$ over a finite set of disturbance scenarios $\mathcal{S}$ (Wu et al., 8 Apr 2025).
Contextual Disturbances Predictor (CDP) / Language-to-Distribution (L2D) Module: The LLM output is mapped via a neural decoding module $g_\theta$ (affine and differentiable in $\theta$ ) to generate the predicted disturbance sequence $\hat w_{t:\mathcal{T}|t}$ . Specifically, $\hat w_{t:\mathcal{T}|t} = \sum_{s\in\mathcal{S}}p(s|c_t)\,w^s_{t:\mathcal{T}}$ or, in embedding-based implementations (as in (Wu et al., 5 Dec 2025)), LLM-extracted discrete features are mapped to a continuous embedding $d_{t:\mathcal{T}|t}$ , then decoded to disturbance estimates.
MPC Integration: The planner solves a quadratic program using plant dynamics,

$x_{t+1}=A x_t+B u_t+\hat w_{t|t}$

with cost

$\sum_{\tau=t}^{\mathcal{T}} x_\tau^\top Q x_\tau + u_\tau^\top R u_\tau + x_{\mathcal{T}+1}^\top P x_{\mathcal{T}+1}$

and applies only the first control input $u_t$ , receding over time.

Feedback and Learning: Upon observing actual disturbances $w_t$ and the realized control cost, the parameters $\theta_t$ of the L2D/CDP are fine-tuned online via a tailored, control-aware surrogate loss, closing the loop.

3. Mathematical Framework and Optimization

The core predictive control task is formulated for linear systems as follows:

Plant dynamics: $x_{t+1} = A x_t + B u_t + w_t$ , where $(A,B)$ is stabilizable.
At time $t$ , the L2D/CDP provides a $k$ -step ahead forecast $\hat w_{t:\mathcal{T}|t}$ based on context.
The MPC optimization solves

$\min_{u_{t:\mathcal{T}}}\sum_{\tau=t}^{\mathcal{T}} x_\tau^\top Q x_\tau + u_\tau^\top R u_\tau + x_{\mathcal{T}+1}^\top P x_{\mathcal{T}+1}$

subject to $x_{\tau+1} = A x_\tau + B u_\tau + \hat w_{\tau|t}$ .

The control law admits a closed-form in the linear-quadratic setting.

Crucially, the L2D/CDP map $g_\theta$ is parameterized to be affine in $\theta$ and differentiable. This structure enables efficient fine-tuning and theoretical analysis of the online regret.

The fine-tuning procedure exploits a surrogate loss $L_t(\theta) = \hat\psi_t(\theta)^\top H \hat\psi_t(\theta)$ , where

$\hat\psi_t(\theta) = \sum_{\tau=t}^{\mathcal{T}} (F^\top)^{\tau-t} P w_\tau - \sum_{\tau=t}^{\mathcal{T}} (F^\top)^{\tau-t} P g_{\theta}^{(\tau-t+1)}(c_t)$

and $F$ is the closed-loop dynamics matrix. Gradients are delayed by horizon $k$ due to delayed availability of true disturbances.

4. Online Fine-Tuning and Theoretical Regret Analysis

Parameter updates to $g_\theta$ are performed via delayed gradient descent on the surrogate loss, i.e.,

$\theta_{t+1} = \theta_t - \eta_t \nabla_\theta L_{t-k+1}(\theta_{t-k+1})$

with step sizes $\eta_t$ decreasing appropriately with $t$ . Under model assumptions (affine and Lipschitz $g_\theta$ , bounded $L_t$ gradients), the main theoretical result is a regret bound:

$J(\theta_{1:T}) - J(\theta^\star) = O(\sqrt{T\log T})$

where $J$ is the cumulative MPC cost and $\theta^\star$ is the best fixed parameter in hindsight. The surrogate loss is constructed to guarantee that minimization drives down true regret, even in non-stationary settings and in the presence of learning delay.

For fine-tuning, the framework leverages DPO-style updates: observed disturbance sequences are compared to scenario forecasts, and parameters are updated to increase the likelihood of scenarios nearest the ground-truth.

5. Instantiations and Empirical Evaluations

InstructMPC has been implemented and tested across several representative control domains (Wu et al., 5 Dec 2025):

Power-Grid Operation: InstructMPC, equipped with a CDP transformer, leverages operator input such as maintenance schedules, topology changes, or event-driven demand surges to improve disturbance prediction and control, outperforming history-driven or open-loop baselines.
Power-Infrastructure Drone Inspection: For quadrotor path tracking under stochastic wind, the CDP module encodes known reference shifts and linear embeddings of wind readings, with a tailored control-aware loss yielding improved tracking tightness over classical MSE or MAE losses.
Battery SoC Management: In a battery management scenario with schedule-driven loads and photovoltaic generation, LLM-based context extraction (e.g., ChatGPT classification of compute job descriptions) enables mapping shell commands to “effort level” embeddings, then to accurate disturbance forecasts via a linear decoder. InstructMPC achieves lower cost than metadata-driven regression or static forecasts.

The central empirical finding is that LLM-extracted, unstructured semantic context—encoded by InstructMPC into disturbance forecasts and maintained via closed-loop, task-aware learning—can significantly outperform classical predictors in dynamic, event-driven environments.

6. Modeling Assumptions and Practical Considerations

The framework is analyzed under the following modeling assumptions:

Linear plant dynamics with stabilizable $(A,B)$ .
Bounded disturbances $w_t$ , and (optionally) bounded parameter domain $\Theta$ for $\theta$ .
Affine and Lipschitz-continuous CDP decoder $g_\theta$ .
Bounded surrogate loss gradients.
Arbitrary, potentially non-stationary, evolution of context $c_t$ and disturbance $w_t$ ; regret bound holds agnostic to these sequences.

A plausible implication is that while InstructMPC’s theoretical guarantees are currently given for linear settings, the modular design and LLM-agnostic contextualization enable deployment across a range of high-impact applications where unstructured context is essential for optimal control performance.

7. Summary Table: Core Components of InstructMPC

Component	Description	Reference
L2D / CDP Module	Neural net mapping context $c_t$ to disturbance forecast $\hat w_{t:\mathcal{T}\|t}$	(Wu et al., 8 Apr 2025, Wu et al., 5 Dec 2025)
Surrogate Loss	Control-aware quadratic form $L_t(\theta)$ aligning predictions with realized cost	(Wu et al., 8 Apr 2025, Wu et al., 5 Dec 2025)
Fine-Tuning	Online gradient descent (delayed) on $L_t$ , parameter update for $g_\theta$	(Wu et al., 8 Apr 2025, Wu et al., 5 Dec 2025)

The L2D/CDP module bridges human semantic input and optimization, the surrogate loss aligns learning with control objectives, and the closed-loop fine-tuning ensures adaptability and theoretical performance in non-stationary environments.

Markdown Report Issue Upgrade to Chat

References (2)

InstructMPC: A Human-LLM-in-the-Loop Framework for Context-Aware Control (2025)

InstructMPC: A Human-LLM-in-the-Loop Framework for Context-Aware Power Grid Control (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to InstructMPC.