Papers
Topics
Authors
Recent
Search
2000 character limit reached

InstructMPC: LLM-Enhanced Predictive Control

Updated 8 December 2025
  • InstructMPC is a framework that integrates human semantic inputs via large language models with model predictive control for enhanced disturbance forecasting.
  • It employs a dual feedback loop architecture that fuses contextual data processing with online fine-tuning to optimize control actions.
  • The system achieves robust performance for linear systems with a theoretical regret bound of O(√(T log T)), ensuring control efficiency in dynamic settings.

InstructMPC is a human–LLM–in-the-loop framework for context-aware Model Predictive Control (MPC), developed to address the limitations of traditional MPC in domains where semantic, operator-supplied, or otherwise unstructured contextual information is crucial for disturbance forecasting. It augments classic MPC architectures by integrating LLMs to interpret high-level natural language instructions and encode them into quantitative disturbance forecasts, which are then optimally fused into the control loop. InstructMPC combines closed-loop learning with rigorous theoretical guarantees, achieving an O(TlogT)O(\sqrt{T\log T}) regret for linear systems under a tailored, decision-focused loss optimized by methods such as Direct Preference Optimization (DPO) (Wu et al., 8 Apr 2025, Wu et al., 5 Dec 2025).

1. Motivation and Limitations of Traditional MPC

Conventional finite-horizon MPC methods rely on mechanistic or statistical models to generate future disturbance forecasts {w^tt,,w^Tt}\{\hat w_{t|t},\ldots,\hat w_{\mathcal{T}|t}\} for decision making. These methods typically do not provide any mechanism for incorporating high-level, semantic, or unstructured operator instructions—such as upcoming scheduled events, emergency interventions, textual descriptions of weather or system state, or expert “gut instincts”. In domains such as building energy management, autonomous robotics, or power system operation, the disturbances often have a strong semantic or context-driven component not captured by conventional time-series models, leading to suboptimal control policies if this information is ignored (Wu et al., 8 Apr 2025, Wu et al., 5 Dec 2025).

The inability to adapt dynamically to changes in environment, context, or operator preference further constrains the utility of standard MPC frameworks. Additionally, direct closed-loop fine-tuning of neural predictors within the control loop is non-trivial; since the control cost depends on realized disturbances observable only after actions are taken, naïve gradient-based adaptations generally fail to align prediction objectives with actual control performance.

2. System Architecture and Data Flow

InstructMPC introduces a dual feedback loop architecture that tightly couples semantic context, disturbance prediction, and control optimization. The primary system modules are:

  • Contextual Data Collection: At each time step tt, the system observes plant state xtRnx_t\in\mathbb{R}^n and receives context ctc_t, which is a free-form natural language instruction (e.g., “start backup generator at 2 pm,” “expect sudden wind surge,” or “maintenance on substation B tonight”).
  • LLM-Based Module: The context ctc_t (or a sequence ct:Ttc_{t:\mathcal{T}|t} for multi-step prediction) is processed via a prompting template through a pretrained LLM (e.g., LLaMA-8B or similar backbones). The LLM produces a probability distribution p(sct)p(s|c_t) over a finite set of disturbance scenarios S\mathcal{S} (Wu et al., 8 Apr 2025).
  • Contextual Disturbances Predictor (CDP) / Language-to-Distribution (L2D) Module: The LLM output is mapped via a neural decoding module gθg_\theta (affine and differentiable in θ\theta) to generate the predicted disturbance sequence w^t:Tt\hat w_{t:\mathcal{T}|t}. Specifically, w^t:Tt=sSp(sct)wt:Ts\hat w_{t:\mathcal{T}|t} = \sum_{s\in\mathcal{S}}p(s|c_t)\,w^s_{t:\mathcal{T}} or, in embedding-based implementations (as in (Wu et al., 5 Dec 2025)), LLM-extracted discrete features are mapped to a continuous embedding dt:Ttd_{t:\mathcal{T}|t}, then decoded to disturbance estimates.
  • MPC Integration: The planner solves a quadratic program using plant dynamics,

xt+1=Axt+But+w^ttx_{t+1}=A x_t+B u_t+\hat w_{t|t}

with cost

τ=tTxτQxτ+uτRuτ+xT+1PxT+1\sum_{\tau=t}^{\mathcal{T}} x_\tau^\top Q x_\tau + u_\tau^\top R u_\tau + x_{\mathcal{T}+1}^\top P x_{\mathcal{T}+1}

and applies only the first control input utu_t, receding over time.

  • Feedback and Learning: Upon observing actual disturbances wtw_t and the realized control cost, the parameters θt\theta_t of the L2D/CDP are fine-tuned online via a tailored, control-aware surrogate loss, closing the loop.

3. Mathematical Framework and Optimization

The core predictive control task is formulated for linear systems as follows:

  • Plant dynamics: xt+1=Axt+But+wtx_{t+1} = A x_t + B u_t + w_t, where (A,B)(A,B) is stabilizable.
  • At time tt, the L2D/CDP provides a kk-step ahead forecast w^t:Tt\hat w_{t:\mathcal{T}|t} based on context.
  • The MPC optimization solves

minut:Tτ=tTxτQxτ+uτRuτ+xT+1PxT+1\min_{u_{t:\mathcal{T}}}\sum_{\tau=t}^{\mathcal{T}} x_\tau^\top Q x_\tau + u_\tau^\top R u_\tau + x_{\mathcal{T}+1}^\top P x_{\mathcal{T}+1}

subject to xτ+1=Axτ+Buτ+w^τtx_{\tau+1} = A x_\tau + B u_\tau + \hat w_{\tau|t}.

  • The control law admits a closed-form in the linear-quadratic setting.

Crucially, the L2D/CDP map gθg_\theta is parameterized to be affine in θ\theta and differentiable. This structure enables efficient fine-tuning and theoretical analysis of the online regret.

The fine-tuning procedure exploits a surrogate loss Lt(θ)=ψ^t(θ)Hψ^t(θ)L_t(\theta) = \hat\psi_t(\theta)^\top H \hat\psi_t(\theta), where

ψ^t(θ)=τ=tT(F)τtPwττ=tT(F)τtPgθ(τt+1)(ct)\hat\psi_t(\theta) = \sum_{\tau=t}^{\mathcal{T}} (F^\top)^{\tau-t} P w_\tau - \sum_{\tau=t}^{\mathcal{T}} (F^\top)^{\tau-t} P g_{\theta}^{(\tau-t+1)}(c_t)

and FF is the closed-loop dynamics matrix. Gradients are delayed by horizon kk due to delayed availability of true disturbances.

4. Online Fine-Tuning and Theoretical Regret Analysis

Parameter updates to gθg_\theta are performed via delayed gradient descent on the surrogate loss, i.e.,

θt+1=θtηtθLtk+1(θtk+1)\theta_{t+1} = \theta_t - \eta_t \nabla_\theta L_{t-k+1}(\theta_{t-k+1})

with step sizes ηt\eta_t decreasing appropriately with tt. Under model assumptions (affine and Lipschitz gθg_\theta, bounded LtL_t gradients), the main theoretical result is a regret bound:

J(θ1:T)J(θ)=O(TlogT)J(\theta_{1:T}) - J(\theta^\star) = O(\sqrt{T\log T})

where JJ is the cumulative MPC cost and θ\theta^\star is the best fixed parameter in hindsight. The surrogate loss is constructed to guarantee that minimization drives down true regret, even in non-stationary settings and in the presence of learning delay.

For fine-tuning, the framework leverages DPO-style updates: observed disturbance sequences are compared to scenario forecasts, and parameters are updated to increase the likelihood of scenarios nearest the ground-truth.

5. Instantiations and Empirical Evaluations

InstructMPC has been implemented and tested across several representative control domains (Wu et al., 5 Dec 2025):

  • Power-Grid Operation: InstructMPC, equipped with a CDP transformer, leverages operator input such as maintenance schedules, topology changes, or event-driven demand surges to improve disturbance prediction and control, outperforming history-driven or open-loop baselines.
  • Power-Infrastructure Drone Inspection: For quadrotor path tracking under stochastic wind, the CDP module encodes known reference shifts and linear embeddings of wind readings, with a tailored control-aware loss yielding improved tracking tightness over classical MSE or MAE losses.
  • Battery SoC Management: In a battery management scenario with schedule-driven loads and photovoltaic generation, LLM-based context extraction (e.g., ChatGPT classification of compute job descriptions) enables mapping shell commands to “effort level” embeddings, then to accurate disturbance forecasts via a linear decoder. InstructMPC achieves lower cost than metadata-driven regression or static forecasts.

The central empirical finding is that LLM-extracted, unstructured semantic context—encoded by InstructMPC into disturbance forecasts and maintained via closed-loop, task-aware learning—can significantly outperform classical predictors in dynamic, event-driven environments.

6. Modeling Assumptions and Practical Considerations

The framework is analyzed under the following modeling assumptions:

  • Linear plant dynamics with stabilizable (A,B)(A,B).
  • Bounded disturbances wtw_t, and (optionally) bounded parameter domain Θ\Theta for θ\theta.
  • Affine and Lipschitz-continuous CDP decoder gθg_\theta.
  • Bounded surrogate loss gradients.
  • Arbitrary, potentially non-stationary, evolution of context ctc_t and disturbance wtw_t; regret bound holds agnostic to these sequences.

A plausible implication is that while InstructMPC’s theoretical guarantees are currently given for linear settings, the modular design and LLM-agnostic contextualization enable deployment across a range of high-impact applications where unstructured context is essential for optimal control performance.

7. Summary Table: Core Components of InstructMPC

Component Description Reference
L2D / CDP Module Neural net mapping context ctc_t to disturbance forecast w^t:Tt\hat w_{t:\mathcal{T}|t} (Wu et al., 8 Apr 2025, Wu et al., 5 Dec 2025)
Surrogate Loss Control-aware quadratic form Lt(θ)L_t(\theta) aligning predictions with realized cost (Wu et al., 8 Apr 2025, Wu et al., 5 Dec 2025)
Fine-Tuning Online gradient descent (delayed) on LtL_t, parameter update for gθg_\theta (Wu et al., 8 Apr 2025, Wu et al., 5 Dec 2025)

The L2D/CDP module bridges human semantic input and optimization, the surrogate loss aligns learning with control objectives, and the closed-loop fine-tuning ensures adaptability and theoretical performance in non-stationary environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to InstructMPC.