Deterministic Sampling CEM (dsCEM)
- The paper introduces dsCEM, which deterministically samples control trajectories using localized cumulative distributions to enhance performance in nonlinear MPC tasks.
- It leverages affine transformations of precomputed Dirac mixtures to replace Gaussian random sampling, ensuring smoother control inputs and efficient computation.
- Experimental results on benchmark tasks like Mountain Car and Cart-Pole show up to 30% cost reduction and 40% smoother controls in low-sample regimes.
Deterministic Sampling CEM (dsCEM) is a model predictive control (MPC) framework for nonlinear optimal control tasks that replaces the conventional random sampling step of the Cross-Entropy Method (CEM) with a deterministic procedure. This deterministic approach uses sample sets derived from localized cumulative distributions (LCDs), providing improved sample efficiency and smoother control input trajectories, particularly in regimes with a low number of samples. The dsCEM methodology preserves the core structure and tuning of standard CEM-MPC, enabling straightforward integration as a drop-in replacement for random sampling-based controllers (Walker et al., 7 Oct 2025).
1. Background: Standard CEM–MPC Formulation
In CEM–MPC, a finite-horizon control sequence $\vu_{k:k+N-1}$ is optimized to minimize a cost function for a nonlinear discrete-time system
$\vx_{n+1} = \va_n(\vx_n, \vu_n)\,,$
with an associated cumulative cost
$J_k(\vu_{k:k+N-1}) = g_N(\vx_{k+N}) + \sum_{n=k}^{k+N-1} g_n(\vx_n, \vu_n)\,.$
CEM places a Gaussian proposal distribution
$f(\vu;\vtheta_j) = \mathcal{N}(\vu ; \vmu_j, \mC_j), \quad \vtheta_j = (\vmu_j, \mC_j)$
over the flattened trajectory vector $\vu = [\vu_k^\top, \ldots, \vu_{k+N-1}^\top]^\top$. At each iteration, samples are drawn, their costs evaluated, the elite set (top samples) is selected, and the mean and covariance are updated: $\vmu_{j+1} = \frac{1}{|\mathcal{E}_j|} \sum_{\vu \in \mathcal{E}_j} \vu\,, \quad \mC_{j+1} = \frac{1}{|\mathcal{E}_j|} \sum_{\vu \in \mathcal{E}_j} (\vu - \vmu_{j+1})(\vu - \vmu_{j+1})^\top\,.$ This process iterates, returning either the first control from $\vmu_J$ or the best sampled sequence.
2. Deterministic Sampling via Localized Cumulative Distributions (LCDs)
The core innovation of dsCEM is the replacement of random Gaussian samples by a deterministic Dirac mixture, optimized according to LCD-based discrepancy criteria. For a -dimensional PDF $f(\vx)$, one defines the LCD at location $\vm$ and scale as
$F(\vm, b) = \int_{\IR^d} f(\vx) K(\vx, \vm, b)\, d\vx, \quad K(\vx, \vm, b) = \exp\left(-\frac{\|\vx - \vm\|^2}{2b^2}\right)\,.$
The Cramér–von Mises–type distance,
$D_{\mathrm{CvM}} = \int_{b > 0} w(b)\int_{\vm} (\tilde F(\vm, b) - F(\vm, b))^2\, d\vm\, db,$
with , quantifies the discrepancy of a Dirac mixture $\tilde f(\vx) = \frac{1}{N}\sum_{i}\delta(\vx - \vx^{(i)})$ from . Minimizing this discrepancy yields an optimal low-discrepancy sample skeleton $\{\tilde\vu^{(i)}\}_{i=1}^N$ for the isotropic Gaussian , computed offline. For each CEM iteration (online), these samples are affinely transformed according to the current proposal parameters: $\vu^{(i)} = \vmu_j + \mL_j\tilde\vu^{(i)}, \quad \text{where } \mC_j=\mL_j\mL_j^\top,$ ensuring the sample set deterministically matches the intended Gaussian.
3. Modular Variants: Sample Variability and Temporal Correlation
To avoid repeatability and enhance exploration while preserving determinism, dsCEM introduces three sample variability schemes and two temporal-correlation methods:
- Sample-set variability:
- V1 (Random Rotation): Applies a random rotation $\mR_j\in SO(d_uN)$ before affine transformation.
- V2 (Deterministic Joint-Density): Precomputes a block in $\IR^{d_uN \, J}$ and slices a fresh block per CEM iteration.
- V3 (Time-step Rotation): Applies a fixed random rotation per MPC step, subsequently using V2’s slicing.
- Temporal correlations:
- M1 (Fixed Correlation + Adaptive Variance): $\mC_j = \mathrm{diag}(\vsigma_j)\,\mC_\rho\,\mathrm{diag}(\vsigma_j)$. Here, $\mC_\rho$ is a fixed Toeplitz matrix structured according to a chosen power spectral density (PSD), with variances $\vsigma_j$ updated adaptively based on elite sets.
- M2 (Adaptive Full Covariance): Initializes from a colored-noise PSD and updates the full covariance following the elite-sample statistics.
The overall dsCEM-MPC algorithm incorporates these choices modularly, allowing flexible adaptation to problem structure and smoothness requirements.
4. Integration into Existing CEM Controllers
dsCEM is designed for compatibility with existing CEM-MPC implementations. Integration consists of substituting the standard random sampling with the deterministic LCD-based sample generation:
- Replace random sampling from $\mathcal{N}(\vmu_j, \mC_j)$ with transforming a precomputed skeleton:
- Optional sample rotation via $\mR$
- Affine transformation as above
Additional controller parameters include:
- Selection of variability scheme (V1, V2, V3)
- Selection of temporal-correlation method (M1, M2)
- Precomputed pool of skeleton samples of desired size
The remainder of the CEM update, elite selection, iteration cycling, and hyperparameter tuning are unaltered.
5. Experimental Evaluation and Performance Metrics
dsCEM was evaluated on two canonical nonlinear MPC benchmarks:
- Mountain Car: RK4 discretization (, horizon ), quadratic state/control cost, mild process noise.
- Cart-Pole Swing-Up: RK4 discretization (, horizon ), standard nonlinear cart-pole states, augmented with trigonometric state components, and low process noise.
Both experiments used 3 CEM iterations, momentum , and 100 independent trials across sample sizes . Two performance metrics were assessed:
- Cumulative cost: $\sum_{k=0}^{T-1} g_k(\vx_k, u_k)$
- Smoothness of control input:
6. Results, Guidelines, and Limitations
Quantitative Findings:
In the low-sample regime (), dsCEM-V2 achieved up to 30 % lower cumulative cost and up to 40 % lower smoothness score compared to iCEM. All dsCEM variants required fewer iterations and samples to reach a given cost level. For example, with , dsCEM-V2 matched or exceeded the smoothness of iCEM run with samples. The M2 method (full covariance) was less robust at very small , due to its elevated elite-set requirements. No formal hypothesis testing was reported, but interquartile ranges over 100 runs indicated non-overlapping medians in low-sample settings (Walker et al., 7 Oct 2025).
Hyperparameter Guidelines:
- Sample size : dsCEM provides greatest benefit for ; above , differences vs random sampling diminish.
- Variability scheme: V2 is recommended for deterministic, high-smoothness control; V1 or V3 for some randomness.
- LCD pool size: Set equal to the largest anticipated during control; computed offline using established LCD solvers.
- PSD exponent : Tune for smoothness; e.g., yields "pink noise."
- Warm-start and momentum : .
Limitations and Extensions:
- M2 covariance adaptation is less effective in sample-scarce settings due to large .
- LCD-based affine transforms do not preserve LCD-optimality for non-isotropic proposals.
- Offline sample computation is computationally intensive for large .
- Potential future work includes integration with learned warm-starts (e.g., normalizing flows), adaptive online LCD pools for non-Gaussian proposals, theoretical analysis of affine-transformed sample discrepancy, and explicit extension to stochastic or chance-constrained MPC.
7. Significance and Prospective Directions
dsCEM introduces determinism and improved structure to the CEM-MPC sampling process, leading to enhanced sample efficiency and control smoothness, features especially pronounced in resource-limited regimes. The approach maintains compatibility with standard CEM controller architectures, requiring minimal interface adjustment. Extensions suggested include hybridization with learning-based warm-starts, further adaptation to structured noise models, and rigorous convergence theory. These directions may strengthen practical applicability in high-dimensional control, safety-critical planning, and non-Gaussian or uncertainty-aware MPC contexts (Walker et al., 7 Oct 2025).