Model Predictive Path Integral PID Control for Learning-Based Path Following

Published 31 Mar 2026 in eess.SY, cs.LG, cs.RO, and math.OC | (2603.29499v1)

Abstract: Classical proportional--integral--derivative (PID) control is widely employed in industrial applications; however, achieving higher performance often motivates the adoption of model predictive control (MPC). Although gradient-based methods are the standard for real-time optimization, sampling-based approaches have recently gained attention. In particular, model predictive path integral (MPPI) control enables gradient-free optimization and accommodates non-differentiable models and objective functions. However, directly sampling control input sequences may yield discontinuous inputs and increase the optimization dimensionality in proportion to the prediction horizon. This study proposes MPPI--PID control, which applies MPPI to optimize PID gains at each control step, thereby replacing direct high-dimensional input-sequence optimization with low-dimensional gain-space optimization. This formulation enhances sample efficiency and yields smoother inputs via the PID structure. We also provide theoretical insights, including an information-theoretic interpretation that unifies MPPI and MPPI--PID, an analysis of the effect of optimization dimensionality on sample efficiency, and a characterization of input continuity induced by the PID structure. The proposed method is evaluated on the learning-based path following of a mini forklift using a residual-learning dynamics model that integrates a physical model with a neural network. System identification is performed with real driving data. Numerical path-following experiments demonstrate that MPPI--PID improves tracking performance compared with fixed-gain PID and achieves performance comparable to conventional MPPI while significantly reducing input increments. Furthermore, the proposed method maintains favorable performance even with substantially fewer samples, demonstrating its improved sample efficiency.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper demonstrates that optimizing low-dimensional PID gain vectors via MPPI significantly enhances sample efficiency and smoothness in control applications.
It employs a hybrid model that combines a physics-based system with a neural residual to correct velocity and angular rate errors effectively.
Experimental evaluations on forklift data show that MPPI–PID outperforms fixed-gain PID and standard MPPI, especially under low-sample conditions.

Model Predictive Path Integral PID Control for Learning-Based Path Following

Introduction

The paper "Model Predictive Path Integral PID Control for Learning-Based Path Following" (2603.29499) proposes a receding-horizon control architecture that integrates Model Predictive Path Integral (MPPI) optimization with PID control, forming the MPPI–PID approach. The method seeks to address discontinuities and sample inefficiency endemic to pure MPPI-based path following, particularly when deployed in industrial scenarios where the PID structure is standard.

The authors provide a comprehensive framework combining theoretical development—anchored by information-theoretic analysis—with empirical validation on a learning-based residual model identified from real forklift operation data. The core contribution is the demonstration that direct optimization of low-dimensional PID gain vectors via MPPI provides smoother, more sample-efficient control compared to direct input-sequence optimization, while retaining performance parity with standard MPPI.

Methodology

Hybrid Model: Physics-Based and Residual Neural Dynamics

The underlying system dynamic model utilized is a hybrid combining a physics-based backbone with a neural network residual, following Universal Differential Equations (UDEs) principles. This architecture corrects model errors in velocity and angular rate states while biasing toward the physical model in data-sparse regions via a state-dependent weighting.

Key properties:

The mask $M^\text{res}$ ensures residual learning is applied selectively to velocity and rotational states.
The weighting function $w^\text{res}(x)$ attenuates the neural network’s contribution at low speeds.

PID Control and MPPI Formulation

In classical PID, input smoothness is induced by the integral/derivative terms, but fixed gains are insufficient for challenging, time-varying tracking tasks. Standard MPPI addresses nonlinearities and non-differentiability, but direct sampling over input sequences leads to high optimization dimensionality and discontinuous control updates.

MPPI–PID shifts the optimization focus from input sequences to PID gain vectors $\theta$ , significantly reducing dimensionality and leveraging the smoothness properties of PID structure. At each control timestep, MPPI is used to sample and optimize over gain space, rather than input space.

Figure 1: Overview of the proposed MPPI–PID method showing gain-space sampling and dynamics modeling with a hybrid physical/neural architecture.

Algorithm summary:

At each MPC step, PID gains are perturbed/sampled.
The corresponding sequence of inputs and costs is computed using the PID controller with those gains, projected onto feasible sets.
Gains are updated via a weighted averaging rule equivalent to an information-theoretic KL projection.
The updated gains are used to generate the applied control input.

Theoretical Analyses

Information-Theoretic Unification and Sample Efficiency

The update law for optimization variables in both MPPI and MPPI–PID is shown to correspond to an information-theoretic KL projection of an exponentially-reweighted proposal distribution. In the small-noise, differentiable case, this yields standard stochastic gradient descent.

A notable theoretical result is the sample efficiency analysis: for an optimization variable of dimension $n_z$ and $I$ samples, the effective sample size (ESS) decays exponentially with $n_z$ . For typical path-following tasks, the input-sequence space has $n_uN$ dimensions (often large), but the PID gain space has only a handful. Thus, for fixed $I$ , the MPPI–PID approach maintains much higher ESS and thus sample efficiency.

Input Continuity

Temporal correlation of control input perturbations is analyzed. In MPPI–PID, the sequence of inputs generated by a fixed gain vector has correlated perturbations, leading to smooth transitions in time; by contrast, standard MPPI produces uncorrelated input perturbations, resulting in discontinuous commands. Analytical expressions for the temporal covariance structure corroborate this.

Experimental Evaluation

System Identification

The hybrid model was trained on data from a physical forklift platform, with preprocessing (filtering/interpolation), and validated using $R^2$ on recursive trajectory predictions. The residual model yielded the highest mean $R^2$ (0.9503), outperforming both a pure physics model and a standard NN model.

Path Following: High and Low Sample Regimes

Path following experiments compared fixed-gain PID, conventional MPPI, and MPPI–PID, using a cubic Hermite spline as reference and a receding horizon of 60 steps.

Figure 2: Comparison of planar trajectories in high-sample regime ( $w^\text{res}(x)$ 0); the MPPI–PID and conventional MPPI closely follow the reference, while fixed-gain PID exhibits larger deviations.

In the high-sample budget ( $w^\text{res}(x)$ 1), both MPPI–PID and conventional MPPI achieved accurate tracking. Fixed-gain PID lacked adaptability, resulting in cumulative path deviation. Temporal profiles of control inputs further show that MPPI–PID produces substantially smaller and smoother increments than conventional MPPI, as anticipated from the theoretical analysis.

Reducing $w^\text{res}(x)$ 2 to 16 revealed stark differences in robustness. MPPI’s tracking deteriorated significantly due to the curse of dimensionality in sample-based optimization, while MPPI–PID preserved both tracking accuracy and input smoothness.

Figure 3: Comparison of planar trajectories with low sample size ( $w^\text{res}(x)$ 3); only MPPI–PID maintains close adherence to the reference path, evidencing superior sample efficiency.

Implications and Future Directions

These results highlight the principal strengths of the MPPI–PID framework in settings where online adaptation, sample efficiency, and smooth actuation are critical. The method is particularly compelling for industrial systems where PID control remains dominant, non-differentiable objectives are required, and physical model inaccuracies abound.

Practical implications include:

Direct drop-in enhancement for existing PID-based deployments via online gain adaptation.
Applicability to hybrid physical/data-driven models in situations of partial observability or model error.
Substantially reduced computational requirements in high-dimensional or real-time applications due to superior sample efficiency.

From a theoretical perspective, the information-theoretic interpretation connects MPPI updates to KL projections, facilitating rigorous analysis of convergence and sample complexity.

For future work, extensions could include:

Empirical validation on full-scale physical platforms.
Generalization to stochastic and robust control settings with model uncertainty.
Integration of obstacle avoidance or complex task specifications into the cost function, leveraging the inherent flexibility of MPPI.

Conclusion

The MPPI–PID architecture provides an efficient framework for learning-based path following combining model-based gradient-free optimization with the continuity and industrial compatibility of PID control. By optimizing in a low-dimensional gain space, it overcomes the sample inefficiency and discontinuity issues of direct input-sequence sampling, as verified theoretically and empirically in simulated forklift path-following. The approach offers a promising template for control design in data-driven and hybrid modeling contexts where real-time adaptation and smooth actuation are non-negotiable.

Markdown Report Issue