Sampling-Based Optimization for Multi-Agent MPC

Updated 10 November 2025

Sampling-based optimization is a method that uses randomized techniques like MPPI and CEM to tackle high-dimensional, nonconvex multi-agent MPC problems.
It enables decentralized control by integrating variational optimization with distributed frameworks such as consensus-ADMM for collision avoidance and scalability.
The approach achieves robust, real-time performance under uncertain, nonlinear dynamics while ensuring probabilistic safety through SOCP filters and chance constraints.

Sampling-based optimization for multi-agent model predictive control (MPC) is an approach that leverages randomized sampling methods—such as Model Predictive Path Integral (MPPI), Cross Entropy Method (CEM), and Stochastic Search (SS)—to solve high-dimensional, nonconvex optimal control problems arising in distributed multi-agent systems. These techniques build on path-integral and variational optimization perspectives, utilizing weighted samples to update control distributions and are particularly suited for complex nonlinear dynamics, stochastic disturbances, and requirements of decentralized computation. When integrated with distributed frameworks such as consensus Alternating Direction Method of Multipliers (ADMM), these optimizers enable tractable, scalable, and robust collision-free navigation for large-scale multi-agent robotic systems, with theoretical and practical advantages over deterministic, gradient-based, or sample-infeasible formulations.

1. Mathematical Underpinnings of Sampling-Based Multi-Agent MPC

Sampling-based optimization for multi-agent MPC focuses on receding-horizon optimal control across multiple agents, each with nonlinear, potentially stochastic dynamics: $x_{i,t+1} = f_i(x_{i,t}, u_{i,t}), \quad i=1,\dots,N$ with control sequence $\{u_{i,t}\}_{t=0}^{T-1}$ , horizon $T$ , and costs (per-agent running $L_i$ and terminal $\Phi_i$ ): $J(X, U) = \sum_{t=0}^{T-1} \sum_{i=1}^N L_i(x_{i,t}, u_{i,t}) + \sum_{i=1}^N \Phi_i(x_{i,T})$ Subject to state/control constraints and inter-agent collision-avoidance, the centralized problem becomes intractable as $N$ rises, due to the combinatorial coupling via collision constraints $g_{ij}(x_i, x_j)\ge0$ (Wang et al., 2022, Yoon et al., 21 Oct 2025).

Sampling-based methods reframe optimal control as either a variational optimization or a path-integral over control distributions: $\mathcal F = -\lambda\log\mathbb{E}_{p(\tau)}\left[e^{-\frac{1}{\lambda}J(\tau)}\right]$ and seek optimal policies $q^*(\tau)$ , typically parameterized as time-separable Gaussians, through weighted sample approximation.

2. Core Sampling-Based Optimization Algorithms

Variational Optimization / MPPI

Model Predictive Path Integral control is an instance of stochastic search where policies $q(u_t; \mu_t, \Sigma_t)$ are updated through weighted averaging of rollouts. For $M$ samples: $u_t^{(m)} \sim \mathcal{N}(\mu_t, \Sigma_t), \quad w^{(m)} \propto \exp\left(-\frac{J^{(m)}}{\lambda}\right)$ the update is

$\mu_t^{k+1} = \sum_{m=1}^M w^{(m)} u_t^{(m)}, \quad \Sigma_t^{k+1} = \sum_{m=1}^M w^{(m)} (u_t^{(m)} - \mu_t^{k+1})(u_t^{(m)} - \mu_t^{k+1})^\top$

This approach connects directly to path-integral stochastic optimal control via the Feynman–Kac lemma and admits a convergence and sample complexity analysis via stochastic approximation theory (Wang et al., 2022).

Cross Entropy, Tsallis, and Generalized Stochastic Search

By modifying the shape function $S(x)$ in the sample-weighted update (e.g., $S(x)=\mathbb{1}_{x > -\gamma}$ for CEM, Tsallis deformations), different exploration/exploitation biases are induced. The general update in the exponential family of policies is: $\eta_t^{k+1} = \eta_t^k + \alpha^k \frac{\mathbb{E}[S(-J)(T(u_t) - \mathbb{E}[T(u_t)])]}{\mathbb{E}[S(-J)]}$ where $T(u_t)$ are sufficient statistics (Wang et al., 2022). Empirical evidence highlights a trade-off: strong risk-aversion (large- $r$ Tsallis, CEM) curtails variance but may become conservative; MPPI weights exhibit better exploration at potential cost of outlier sensitivity.

3. Distributed and Decentralized Architectures

Centralized MPC formulations with coupling (e.g., global collision-avoidance) are computationally infeasible for large $N$ . Distributed consensus-ADMM frameworks address this by introducing local augmented copies for neighboring agents, splitting the problem: $\tilde x^i_t = [x^i_t, \{x^{ij}_t\}_{j \in \mathcal N^i_t}], \quad \tilde u^i_t = [u^i_t, \{u^{ij}_t\}_{j \in \mathcal N^i_t}]$ with local objectives and consensus variables for state/control agreement. The consensus update steps are executed per time-step:

Primal (local): each agent runs sampling-based optimization on its local augmented cost.
Consensus (global): averaging states/controls across neighbors.
Dual update: Lagrange multipliers updated for consensus penalty terms (Yoon et al., 21 Oct 2025, Wang et al., 2022).

This architecture ensures that each agent's computation depends only on local neighborhood size, achieving per-agent complexity independent of overall network order (barring communication overhead), and enables parallelization of the sampling-based optimization and rollouts.

4. Probabilistic Safety and Collision Avoidance

Sampling-based MPC approaches must ensure that state/control trajectories comply with safety requirements, notably probabilistic collision constraints: $\Pr(\|p_i - p_j\| \leq 2r) \leq \delta$ Multiple approaches for enforcing such constraints include:

Sample-based chance approximations: Monte Carlo estimation of pairwise collision rates via sample trajectories; highly accurate as $N$ (samples) grows but intractable for real-time, many-agent settings due to $O(N^2 M^2 T)$ binary indicator variables (Lyons et al., 2011).
Probabilistic bound (RIPP) method: Agents' mean and covariance define region-of-increased-probability-of-presence (RIPP) sets. Disjoint bounding boxes with allocated risk budgets guarantee joint chance constraint satisfaction. This reduces the number of binaries to $O(M^2 T)$ , offering strong runtime and scalability gains at the expense of mild conservativeness (≤5% cost penalty) (Lyons et al., 2011).
Second-Order Cone Programming (SOCP) safety filters in MPPI: In decentralized uncertainty-aware MPPI, ORCA-derived velocity constraints are imposed as probabilistic chance constraints in the control space, realized via SOCP. Formally, for control mean $\mu'$ and variance $\Sigma'$ ,

$a_j'^\top \mu' + \Phi^{-1}(\delta_\nu) \sqrt{a_j'^\top \Sigma' a_j'} \leq b_j'$

plus componentwise control bounds, yielding a tractable convex program solved at each step (Dergachev et al., 27 Jul 2025).

5. Sample Complexity, Convergence, and Scalability

Analytical guarantees arise from the unifying stochastic search view:

Convergence: Under standard stochastic approximation conditions (diminishing step sizes, increasing samples), parameters converge almost surely to stationary points of the relaxed (entropic) objective (Wang et al., 2022).
Sample Complexity: Error of the empirical estimator is $O(1/\sqrt{M})$ ; explicit deviation bounds are available using concentration inequalities.
Scalability: Distributed sampling-based frameworks demonstrate flat per-agent computational cost as $N$ rises, provided only local neighbor communication is required (Wang et al., 2022, Yoon et al., 21 Oct 2025). For example, in a 196-vehicle formation, centralized sample-based MPC is prohibitive, while distributed sampling-based ADMM achieves constant per-agent cost.
Implementation Practicalities: Number of rollouts $K$ is typically 300–1024 for real-time (≥10 Hz), horizon $T = 15-30$ , sampling noise $\Sigma$ set according to actuation scale, and parallelization is used for rollout evaluation (Dergachev et al., 27 Jul 2025, Yoon et al., 21 Oct 2025, Wang et al., 2022).

Approach	Complexity (per step)	Typical Use Case
Centralized MPC	Exponential in #agents & constraints	≤10 agents, moderate density
Distributed MPPI-ADMM	Linear in local neighbors, #rollouts, $T$	>20 agents, dense/sparse, local communications
SOCP-filtered MPPI	$O(\#\mathrm{neighbors}^3)$ per agent	Decentralized, uncertainty-aware collision avoid

6. Benchmarking and Empirical Performance

Empirical evaluations confirm:

Success Rates and Safety: Decentralized uncertainty-aware MPPI-SOCP attains $100\%$ success and $0\%$ collision for up to 15 robots in densely populated settings, outperforming ORCA-DD and B-UAVC. In random 5–25 robot scenarios, distributed sample-based methods maintain $100\%$ completion, while deterministic or collision-blind methods yield lower success or higher collision rates (Dergachev et al., 27 Jul 2025).
Runtime and Scalability: In distributed stochastic search (SS+ADMM), per-agent computation time scales linearly with agent count; 64-agent Dubins car formation solved in $3$–$5$ seconds per MPC step, whereas derivative-based IPOPT fails to find feasible solutions in complex, nonconvex instances (Yoon et al., 21 Oct 2025).
Real-World Validation: In ROS2/Gazebo with 10 TurtleBot3 robots, the decentralized MPPI-SOCP controller achieves $>10$ Hz real-time control with zero collisions, unlike vendor-supplied DWA or non-sampling controllers (Dergachev et al., 27 Jul 2025).
Trade-offs: Conservative risk-transforms (e.g., CEM, RIPP) incur a small suboptimality ( $\leq5\%$ cost penalty) relative to “ideal” sample-based solvers but deliver formally guaranteed risk bounds and orders-of-magnitude computational savings (Lyons et al., 2011).

7. Theoretical and Practical Implications

Sampling-based optimization has established itself as a robust and scalable paradigm for multi-agent MPC under uncertainty and nonconvexity. Embedding MPPI/SS/CEM within consensus-based distributed frameworks overcomes the curse of global dimensionality by localizing computation and communication. Moreover, integrating probabilistic safety certificates through convex SOCP or region-based chance constraints permits real-time deployment with explicit safety, even with noisy sensors and actuators.

A plausible implication is that the distributed sampling-based optimization framework will remain the most tractable and robust class of solution methods for high-dimensional, decentralized multi-agent MPC, particularly in settings with complex nonlinear dynamics, nonconvex constraints, and uncertainty in both perception and actuation.

Sampling-based optimizers’ ability to handle nonsmooth costs, maintain formal probabilistic safety, and scale to hundreds of agents highlights their practical distinctiveness for multi-agent robotic and autonomous vehicle systems. The primary trade-off is between exploration “aggressiveness” (MPPI, Tsallis) and robustness/conservatism (CEM, RIPP, SOCP), underscoring the necessity of carefully tuned safety and performance parameters tailored to specific deployment scenarios.