Deterministic Sampling in MPC: dsMPPI/dsCEM
- Deterministic sampling is a method that replaces stochastic sampling with precomputed low-discrepancy samples to uniformly cover the proposal distribution.
- It leverages offline LCD construction and runtime linear transformations to minimize variance and efficiently map samples into evolving control distributions.
- These techniques produce smoother control trajectories and improved sample efficiency, making them viable for real-time and embedded MPC applications.
Deterministic sampling refers to a class of sampling-based optimal control and trajectory optimization methods in which the stochastic exploration step—typically involving Monte Carlo draws from a distribution—is replaced entirely by a fixed set of optimally chosen, low-discrepancy samples. In modern model predictive control (MPC) frameworks such as Model Predictive Path Integral (MPPI) and the Cross-Entropy Method for MPC (CEM-MPC), deterministic sampling gives rise to algorithms such as deterministic-sampling CEM (dsCEM) and deterministic-sampling MPPI (dsMPPI). These approaches achieve variance reduction, improved sample efficiency, and substantially smoother control policies compared to their randomly sampled counterparts, especially in nonlinear, time-correlated, or low-sample regimes (Walker et al., 7 Jan 2026, Walker et al., 7 Oct 2025).
1. Foundations and Motivation
Standard sampling-based MPC approaches such as MPPI and CEM-MPC estimate optimal control actions by simulating trajectories through the application of i.i.d. Gaussian noise to a nominal control sequence. However, random sampling exhibits several limitations:
- Poor support coverage: Random samples tend to cluster and leave gaps, requiring large N for accurate expectation estimates ("low-discrepancy" is only achieved asymptotically).
- Chattering and non-smoothness: Lack of temporal correlation between samples induces jerky control signals, which are undesirable in physical systems.
- High computational burden: Large numbers of rollouts are necessary for reliable optimization, challenging real-time applications or embedded deployments (Walker et al., 7 Oct 2025).
Deterministic sampling addresses these limitations by precomputing a set of quadrature-like “optimal Dirac points”—called LCD samples (from Localized Cumulative Distributions)—which span the proposal distribution in a uniform, low-discrepancy manner. These samples are then systematically transformed at run-time into the current proposal distribution, enabling more reliable estimation and efficient use of computational resources (Walker et al., 7 Oct 2025, Walker et al., 7 Jan 2026).
2. Deterministic Sampling Construction and Transformation
The deterministic sample design process involves:
- LCD Construction: For a reference density (typically ), the localized cumulative distribution (LCD) is defined by:
A finite set of sample points is chosen to minimize a Cramér–von Mises–type discrepancy between and its Dirac mixture approximation .
- Offline Optimization: These points are solved for by offline gradient-based minimization of the integrated squared difference between and , producing a deterministic, identity-covariance set with superior uniformity (Walker et al., 7 Oct 2025).
- Runtime Linear Transformation: The sample set is mapped on-the-fly into the current proposal 0 by 1. Here, 2 is the Cholesky factor of 3. This step ensures sample points populate the high-probability region of the current search space (Walker et al., 7 Oct 2025, Walker et al., 7 Jan 2026).
- Temporal Smoothness: To induce smooth time evolution in control sequences, a colored-noise prior is used by embedding a Toeplitz correlation matrix 4 from a 5 power spectrum; the sample transformation becomes 6 with colored base samples (Walker et al., 7 Jan 2026).
3. Detailing dsCEM and dsMPPI Algorithms
Deterministic Cross-Entropy Method (dsCEM)
In dsCEM, the sample replacement occurs within the iterative CEM-MPC loop:
- Deterministic sample generation: 7 for 8
- Cost evaluation: 9
- Elite selection and update: next mean and covariance 0, 1 are computed over the 2 lowest-cost samples (elites). Weighted updates or uniform weights are standard.
- Momentum/adaptive schemes: Optional momentum averaging and two alternatives for covariance adaptation: fixed temporal correlation with adaptive marginal variances, or full covariance adaptation (Walker et al., 7 Oct 2025).
Deterministic Sampling MPPI (dsMPPI)
dsMPPI integrates these deterministic samples into the path integral (MPPI) framework using exponential soft weights rather than the CEM hard-threshold elite set:
- Sample transformation: 3, 4
- Trajectory simulation and cost: Each rollout is propagated, computing 5
- Exponential weighting: 6
- Soft averaging for parameter updates: The mean and diagonal variance are updated by exponentially weighted averages.
- Momentum smoothing: Update 7 and 8 via 9-weighted smoothing to prevent premature collapse and improve exploration/exploitation trade-offs.
- Adaptive temperature: 0 is adapted based on an effective sample size metric (1) so as to keep importance ratios well-behaved (Walker et al., 7 Jan 2026).
Both dsCEM and dsMPPI support alternative variation schemes, such as multi-iteration subsetting (drawing a subset from a larger pool across iterations) and coordinate permutation to increase effective exploration without reintroducing randomness (Walker et al., 7 Jan 2026).
4. Theoretical Insights and Computational Complexity
Deterministic sampling for MPC confers distinct theoretical benefits:
- Variance reduction: Deterministic LCD samples uniformly tile the underlying distribution, minimizing quadrature error relative to i.i.d. sampling. For a fixed computational budget, deterministic methods achieve lower estimation variance and hence require fewer trajectories for equivalent accuracy (Walker et al., 7 Oct 2025, Walker et al., 7 Jan 2026).
- Convergence and bias-variance tradeoff: Momentum-smoothing (2) and adaptive temperature tuning (3) regulate the update magnitude and ensure stability. dsCEM provides exact moment matching for quadratic costs and linear dynamics (Walker et al., 7 Jan 2026).
- Complexity: Both dsCEM and dsMPPI maintain 4 total runtime per control step, with batch trajectory simulation being predominant. Deterministic sample generation and linear-algebraic updates are negligible relative to forward simulation costs (Walker et al., 7 Jan 2026, Walker et al., 7 Oct 2025).
5. Key Hyperparameters and Tuning Practices
Critical hyperparameters and empirically effective ranges for dsCEM and dsMPPI include (Walker et al., 7 Jan 2026):
| Parameter | Typical Range | Significance |
|---|---|---|
| 5 | 50–300 | # deterministic samples per iteration |
| 6 | 10–50 | Horizon length (problem dependent) |
| 7 | 2–5 (usually 3) | CEM/MPPI inner iterations |
| 8 | 9cost | Inverse temperature; adapt via 0 |
| 1 | 0.9–0.99 | Momentum for mean/covariance updates |
| 2 | 0.5–1.5 | Colored-noise exponent for smoothness |
| 3 | 1–5 | Buffer size (tracking best rollouts) |
A frequent guideline: start with 4, 5, 6, and tune 7 so that exponential weights are neither too diffuse nor collapsed, monitoring effective sample size 8. For smoothing, set 9 (Walker et al., 7 Jan 2026).
6. Empirical Performance and Comparison
Empirical evaluation on canonical nonlinear control tasks (e.g., cart-pole swing-up, truck backer-upper) demonstrates:
- Smoother trajectories: dsMPPI yields ≈30% smoother controls (quantified by cumulative 0) compared to dsCEM and ≈60% compared to classic MPPI. On the cart-pole, smoothness improves from 1 (random MPPI) to 2 (dsMPPI) (Walker et al., 7 Jan 2026).
- Sample efficiency: dsCEM achieves comparable or lower cumulative cost than iCEM using 3; e.g., on the mountain car, dsCEM achieves 40% lower cost and 50% smoother inputs at 4 (Walker et al., 7 Oct 2025).
- Computational parity: These deterministic-sampling methods do not incur extra asymptotic online costs relative to their random-sampling equivalents (Walker et al., 7 Jan 2026, Walker et al., 7 Oct 2025).
| Method | Cart-pole Cost | Smoothness | Truck Cost | Smoothness |
|---|---|---|---|---|
| MPPI | 245.3 ± 12.1 | 1.12e5 ± .10e5 | 162.8 ± 8.4 | 0.48e5 ± .05e5 |
| Iterative MPPI | 198.7 ± 9.3 | 0.83e5 ± .08e5 | 140.2 ± 6.7 | 0.39e5 ± .04e5 |
| dsCEM | 185.5 ± 8.7 | 0.61e5 ± .05e5 | 131.9 ± 5.4 | 0.31e5 ± .03e5 |
| dsMPPI | 188.2 ± 9.0 | 0.42e5 ± .03e5 | 133.4 ± 5.9 | 0.28e5 ± .02e5 |
Key findings: dsMPPI and dsCEM match or outperform random-sampling variants both in control cost and input smoothness, with the largest gains in demanding low-sample regimes (Walker et al., 7 Jan 2026, Walker et al., 7 Oct 2025).
7. Extensions, Generalizations, and Limitations
The deterministic-sampling principle is generic and can be transferred between MPPI and CEM approaches, as well as other stochastic sampling-based optimizers. LCD design can also incorporate task-specific priors (e.g. anisotropic variances, stronger colored-noise correlations) (Walker et al., 7 Oct 2025, Walker et al., 7 Jan 2026).
Known limitations include:
- Scalability of LCD construction: The offline computation of the optimal Dirac mixture becomes challenging for high-dimensional spaces, and transforming LCD samples using non-isotropic covariances may degrade optimality.
- Extension to other sampling schemes: In the context of direct policy optimization, deterministic sigma-point collocation achieves exact recovery of the LQR solution for linear–quadratic–Gaussian systems and reduces variance for mildly nonlinear cases (Howell et al., 2020).
A plausible implication is that direct deterministic-sampling frameworks may further benefit from adaptive Dirac-point generation and online adaptation when operating in high-dimensional or time-varying uncertainty regimes.
References:
- (Walker et al., 7 Jan 2026) Smooth Sampling-Based Model Predictive Control Using Deterministic Samples
- (Walker et al., 7 Oct 2025) Sample-Efficient and Smooth Cross-Entropy Method Model Predictive Control Using Deterministic Samples
- (Howell et al., 2020) Direct Policy Optimization using Deterministic Sampling and Collocation