Receding-Horizon Control via Drifting Models

Published 6 Apr 2026 in cs.AI | (2604.04528v1)

Abstract: We study the problem of trajectory optimization in settings where the system dynamics are unknown and it is not possible to simulate trajectories through a surrogate model. When an offline dataset of trajectories is available, an agent could directly learn a trajectory generator by distribution matching. However, this approach only recovers the behavior distribution in the dataset, and does not in general produce a model that minimizes a desired cost criterion. In this work, we propose Drifting MPC, an offline trajectory optimization framework that combines drifting generative models with receding-horizon planning under unknown dynamics. The goal of Drifting MPC is to learn, from an offline dataset of trajectories, a conditional distribution over trajectories that is both supported by the data and biased toward optimal plans. We show that the resulting distribution learned by Drifting MPC is the unique solution of an objective that trades off optimality with closeness to the offline prior. Empirically, we show that Drifting MPC can generate near-optimal trajectories while retaining the one-step inference efficiency of drifting models and substantially reducing generation time relative to diffusion-based baselines.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper presents Drifting MPC, a method that integrates mean-shift drift fields with exponential tilting to generate near-optimal trajectories from offline data.
It achieves a favorable tradeoff by mitigating compounding errors seen in other model-based approaches, delivering near-oracle performance with significantly reduced computational latency.
The paper provides non-asymptotic probabilistic guarantees linking generator calibration to control suboptimality, ensuring robust performance with increasing trajectory samples.

Receding-Horizon Control via Drifting Models

Introduction and Problem Formalization

The paper "Receding-Horizon Control via Drifting Models" (2604.04528) addresses receding-horizon trajectory optimization in the regime where (i) system dynamics are unknown, (ii) online system interaction is disallowed, and (iii) only an offline dataset of expert and non-expert trajectories is available. This context commonly appears in safety-critical control and offline RL settings, motivating methods that can synthesize near-optimal control actions by leveraging only fixed trajectory data.

Existing paradigms such as model-based RL and offline imitation learn surrogate transition models or sequence models, but their efficacy degrades with increasing planning horizon due to compounding model errors and imperfect dynamics identification. Direct trajectory generative models—especially diffusion-based approaches—have recently gained traction for avoiding model compounding, but at the expense of considerable computational latency, as multi-step denoising is required to generate each trajectory.

This paper proposes an alternative framework, Drifting MPC, which combines (a) the computational efficiency of drifting models—single-step pushforward samplers trained via mean-shift drift fields—and (b) optimality bias via exponential tilting of the offline prior towards cost-optimal behaviors. A variational analysis shows that the induced trajectory generator implements a regularized optimal control law, balancing cost minimization with data support.

Drifting MPC: Methodological Innovations

Core Algorithmic Principle

Drifting MPC learns a one-step generator $G_\theta$ that maps $(\epsilon, x_0, \omega) \mapsto \tau$ by training to match exponential tilts of the empirical offline trajectory distribution. The tilting is performed by weighting samples in the drift field according to their cost, so that the positive mean-shift field systematically pulls generated samples toward lower-cost (and still data-supported) trajectories. This is formally justified via a free energy minimization principle:

$\min_{p\ll p_0(\cdot|x_0)} \mathbb{E}_{p}[J_{x_0}(\tau;\omega)] + \tfrac{1}{\beta} \mathrm{KL}(p\|p_0)$

whose solution is the exponentially tilted $p_\beta \propto \exp(-\beta J_{x_0}(\tau;\omega))p_0$ .

Training Objective and Implementation

The empirical positive drift field is computed over the $K$ nearest neighbors (in initial state) from the dataset, and those samples are relabeled under the current cost parameter $\omega$ . Negative drift comes from model samples. The conditional generator is meta-trained across sampled pairs $(x_0, \omega)$ , enabling test-time generalization to arbitrary state and cost queries.

Drifting MPC achieves single-step trajectory generation by directly minimizing the fixed-point loss corresponding to the mean-shifted drift, weighted by exponential cost tilts; this yields a generator whose support remains in-distribution but is sharply biased toward cost-optimal rollouts.

Theoretical Properties and Guarantees

A salient theoretical contribution is a non-asymptotic probabilistic guarantee for receding-horizon control via Best-of- $M$ selection: at every decision step, $M$ trajectories are sampled from the generator conditioned on the current state and cost; the first action from the lowest-cost sample is executed. If the generator is close to $p_\beta$ in total variation, the probability that ALL $(\epsilon, x_0, \omega) \mapsto \tau$ 0 samples are $(\epsilon, x_0, \omega) \mapsto \tau$ 1-suboptimal decays exponentially in $(\epsilon, x_0, \omega) \mapsto \tau$ 2, provided $(\epsilon, x_0, \omega) \mapsto \tau$ 3 puts nonzero mass on $(\epsilon, x_0, \omega) \mapsto \tau$ 4-optimal plans. This links generator calibration directly to control suboptimality bounds in receding-horizon deployments.

Empirical Evaluation

Drifting MPC, Drifting Prior (no cost-tilting), standard Diffusion, and Classifier-Guided Diffusion (Diffuser-style) are compared on a mass-spring-damper system with parameterized quadratic costs. Data is generated by a mixture of controllers, resulting in both optimal and suboptimal rollouts. Controllers are evaluated on mean/median cost and generation time for horizons $(\epsilon, x_0, \omega) \mapsto \tau$ 5.

Drifting MPC demonstrates the best tradeoff: its mean trajectory costs are consistently close to the oracle, outperforming both the prior-only and the diffusion-based models across all horizons. Notably, drift-based generation is two orders of magnitude faster than diffusion-based sampling, scaling efficiently to longer horizons (Table results referenced in the original paper). In long-horizon tasks, the cost distribution remains concentrated and close to the oracle, while the diffusion models display significant variance and heavy tails, indicative of catastrophic failures.

Immediately after the empirical results are described, several figures from the paper highlight these points:

Figure 1: Sample rollouts for $(\epsilon, x_0, \omega) \mapsto \tau$ 6 illustrate Drifting MPC's trajectory tracking compared to other approaches.

Figure 2: Scatter plots comparing the cost of 100 rollouts versus the oracle for horizons $(\epsilon, x_0, \omega) \mapsto \tau$ 7 demonstrate consistent low-cost performance for Drifting MPC.

Discussion and Implications

The results have significant implications:

Offline data constraints: Drifting MPC is robust to early truncation and distribution shift, since the generator is regularized towards offline support. Unlike unregularized model-based planners, it avoids catastrophic compounding due to model mismatch.
Generation latency: Single-step generation enables practical receding-horizon deployment and tight closed-loop planning loops, a limiting factor for diffusion methods requiring iterative refinement.
Scalability: The conditional generator generalizes across cost-parameter queries, supporting meta-control and rapid cost re-weighting with no retraining.
Optimality-Efficiency tradeoff: The variational characterization provides explicit control of cost regularization versus data support, tuned by $(\epsilon, x_0, \omega) \mapsto \tau$ 8.

Potential theoretical lines include convergence analysis under various dataset compositional properties and extensions to stochastic or partially observed dynamics. Practically, Drifting MPC could readily be applied to safety-critical RL, autonomous driving, and high-dimensional robotic trajectory synthesis, where both efficiency and data-anchored optimality are paramount.

Conclusion

Drifting MPC integrates the statistical efficiency of exponentially tilted generative modeling with the computational speed of drifting fields for receding-horizon planning under unknown dynamics and offline data constraints. Extensive experiments validate its advantage over both pure distribution-matching and diffusion-based control synthesis, yielding near-optimal and highly efficient rollouts across a spectrum of horizons and cost parameters. This framework establishes a foundation for scalable and robust offline data-driven model predictive control without recourse to explicit dynamics simulation or slow multi-step generation mechanisms.

Markdown Report Issue