Decentralized Cooperative Planning
- Decentralized cooperative planning is a framework where autonomous agents optimize joint objectives using local communication and distributed decision-making.
- The approach employs methods like ADMM, iLQR, and MCTS to solve nonlinear dynamics and enforce global constraints such as collision avoidance in real time.
- Key applications include multi-robot systems and connected vehicles, demonstrating scalable, robust performance through consensus protocols and reinforcement learning.
Decentralized cooperative planning refers to the class of methods and frameworks in which a team of autonomous agents (e.g., robots, vehicles, software systems) coordinate and jointly optimize their individual plans to achieve shared performance objectives, without reliance on a central decision-maker. Agents interact via local communication, direct modeling of others, or consensus protocols, and must reason about both local goals and global constraints such as collision avoidance or resource sharing. Research in this domain spans nonlinear optimal control, distributed optimization, Monte Carlo tree search, constraint programming, and learning-based approaches, with strong emphasis on real-time scalability, robustness, and information privacy.
1. Formal Problem Setting and Key Objectives
Decentralized cooperative planning is generally formulated as a global optimization problem constrained by distributed dynamics and safety constraints. Consider agents, each with discrete-time nonlinear dynamics
where is the local state, the control input. The joint objective typically takes the form
where penalizes tracking error and control, and encodes pairwise constraints such as collision avoidance or cooperation bonuses. Agents are typically subject to local constraints and must maintain feasibility in the joint state space.
The core challenge is to realize a solution method or protocol that:
- Allows all agents to iteratively solve for their components of , accounting for the impact of others' choices.
- Scales gracefully with , achieving near real-time performance for large teams.
- Preserves privacy or autonomy (often, only neighbor-to-neighbor state or intent exchange is allowed).
- Ensures global constraints (e.g., safety) are respected—typically requiring consensus, dual decomposition, or explicit coordination steps.
2. Distributed and Consensus-Based Optimization Frameworks
One prominent class of methods decomposes the planning problem into per-agent subproblems coordinated via distributed convex optimization, typically by alternating direction methods of multipliers (ADMM) or consensus-based dual decomposition.
Consider the decentralized iLQR/consensus-ADMM approach for connected autonomous vehicles (Huang et al., 2023). The nonlinear and nonconvex global problem is linearized about a nominal trajectory, yielding convex quadratic subproblems per agent. The key steps are:
- Local Linearization: Each agent linearizes its own dynamics and the relevant pairwise constraints at each iteration.
- Dual Consensus Structure: The resulting subproblems admit a consensus constraint of the form where are sparse coupling matrices and is auxiliary; this induces a dual problem with a consensus variable .
- Consensus ADMM: Agents exchange local copies of and update dual and primal variables iteratively, using only neighbor-to-neighbor broadcast, typically in a complete communication graph. ADMM guarantees global constraint enforcement when agents reach consensus.
- Primal Recovery and Line Search: Each agent recovers the local trajectory update by solving a time-varying LQR, and a synchronous iLQR-style forward pass with line-search ensures satisfaction of the true nonlinear dynamics and box constraints.
Fundamentally, this approach enables per-agent complexity per major iteration, in contrast to scaling of fully centralized iLQR. Empirically, for scenarios with up to vehicles, decentralized ADMM reaches convergence in $0.25$ seconds for a 100-step horizon, outperforming Interior Point and SQP baselines by large factors (Huang et al., 2023).
3. Monte Carlo Tree Search and Implicit Coordination
An alternative line of research centers on decentralized Monte Carlo Tree Search (MCTS), often with models of teammates and macro-action abstraction to enable implicit cooperation (Kurzer et al., 2018, Czechowski et al., 2020, Kurzer et al., 2018). The formalism typically involves:
- Decentralized (Semi-)Markov Decision Process: Each agent maintains its own policy, modeling others through heuristic or learned approximations.
- Tree Search: Each agent recursively simulates possible action sequences, either sampling the actions of others (Decoupled-UCT) or leveraging macro-level option hierarchies to reduce search complexity.
- Cooperative Reward Shaping: Each agent's return explicitly incentivizes behavior that is globally beneficial, not just egoistic.
- Teammate Modeling: Recent work trains deep models (e.g., CNNs or neural nets) to approximate the policies of teammates, yielding improved best-response policies and convergence to Nash equilibria in the decentralized planning game (Czechowski et al., 2020).
Hierarchical or macro-action schemes further reduce effective branching factors and enable long-horizon planning with manageable computation. Empirical results in multi-robot and traffic simulation domains show that decentralized MCTS with macro-actions and decoupled backups enables agents to discover cooperative behaviors—such as merging, yielding, and overtaking—even without explicit communication, yielding solutions unattainable through purely egoistic planning (Kurzer et al., 2018, Kurzer et al., 2018).
4. Cooperative Task and Resource Allocation via Decentralized Constraints
Decentralized cooperative planning also encompasses task assignment, resource scheduling, and adaptation in distributed systems. A significant body of research models such problems as distributed constraint optimization problems (DCOPs) (Dragan et al., 2023, Liu et al., 22 Nov 2025). The formal DCOP framework models:
- Agents as decision-makers choosing from discrete variable assignments.
- Preference constraints capturing local utility/cost and consistency constraints capturing global or pairwise coupling (e.g., conflicts or synergies).
- Distributed algorithms (e.g., DPOP) that operate over pseudo-trees, propagating UTIL/VALUE messages for coordinated optimization, with guaranteed convergence for acyclic coordination graphs.
The CoADAPT framework expresses decentralized coordination for self-adaptive systems (e.g., cloud computing resources) as a DCOP, balancing each agent’s local preferences and jointly enforced consistency constraints. Empirical studies highlight robust adaptation quality, scalability to agents, and moderate coordination overhead (linear message size growth for acyclic topology) (Dragan et al., 2023). Similarly, fair and efficient task-allocation under partial observability can be driven by consensus over spatially aware equilibrium programs (e.g., Eisenberg-Gale) implemented in decentralized MARL or batched assignment (Liu et al., 22 Nov 2025).
5. Leaderless Protocols and Real-Time Distributed MPC
Several notable schemes avoid explicit leader election or negotiation, relying instead on protocol-level soft coordination and emergent behavior. The desired/planned trajectory approach (Wartnaby et al., 2019) allows each vehicle to broadcast both a "wish" (desired) and a "commitment" (planned) trajectory at each MPC iteration.
- Other agents weakly avoid "desired" trajectories and strongly avoid "planned" trajectories of peers.
- This dynamic adjustment causes would-be conflicting intents to be resolved: agents yield to each other's wishes in a self-organizing fashion.
- No agent is a leader; all collaborate asynchronously, and cooperation emerges purely through repeated mutual adjustment.
- This protocol achieves linear per-agent complexity and scales to arbitrary team sizes provided each agent can broadcast to its relevant neighbors.
Such schemes have demonstrated real-time viability (control updates every 40ms, horizon 23 steps, per-car trajectory solves ≈ 100ms on standard CPUs) and are empirically robust, with no collisions or deadlocks across a range of complex urban traffic scenarios (Wartnaby et al., 2019).
6. Reinforcement Learning and Distributed Coverage/Safety
Recent work extends decentralized planning to coverage, tracking, and path assignment via distributed reinforcement learning and geometry-aware constraints (Liu et al., 2022, ÅženbaÅŸlar et al., 2023, Yin et al., 1 Dec 2025).
- Dual Guidance RL: DODGE combines artificial potential field encodings of other agents’ intent with local heuristic search, learning decentralized coverage policies that minimize overlap and maintain balanced subarea allocation, with overlap rates as low as for teams of up to $20$ robots (Liu et al., 2022).
- Hard-constrained Trajectory Optimization: RLSS models each robot’s constraint surface (workspace, robot–robot, robot–obstacle) as a set of half-space separations, yielding a per-robot convex QP. No explicit communication or synchronization is required; collision avoidance and deadlock-free operation are guaranteed as long as problem feasibility holds at each step. Real-time performance at 5–10Hz was demonstrated for teams of up to $32$ robots (Şenbaşlar et al., 2023).
- Hierarchical Visibility/Coordination: Decentralized LiDAR-based aerial tracking frameworks employ spherical signed distance field occlusion metrics, field-of-view costs, and swarm potential-based spatial distribution; all modules are differentiable and run fully decentralized with per-drone peer-to-peer state exchange (Yin et al., 1 Dec 2025).
These methods further highlight the breadth of decentralized cooperative planning, from spatial coverage to pursuit-evasion and surveillance.
7. Theoretical Complexity and Practical Algorithmics
Theoretical analysis shows that decentralized cooperative planning can range from NEXP-complete (general Dec-POMDPs) to polynomial-time in special cases (independent transitions and observations, single goal, no benefit to changing goal) (Goldman et al., 2011). Structural properties such as local full observability, acyclic interaction, or independence among agents can make polynomial-time optimal planning feasible. In practice:
- Scalability is achieved through decomposition (subgraphs, local coordination), exploiting sparsity in collision or task graphs, and by leveraging macro-action or abstraction mechanisms.
- Communication and information sharing rarely improve worst-case complexity but are critical for empirical performance and practical coordination in dynamic or open environments.
Algorithmic choices—distributed message passing, ADMM, asynchrony, decentralized prioritized planning—are tailored to problem structure and operational demands (e.g., real-time constraints, robustness to failures, partial observability).
References:
- (Huang et al., 2023) Decentralized iLQR for Cooperative Trajectory Planning of Connected Autonomous Vehicles via Dual Consensus ADMM
- (Kurzer et al., 2018) Decentralized Cooperative Planning for Automated Vehicles with Hierarchical Monte Carlo Tree Search
- (Czechowski et al., 2020) Decentralized MCTS via Learned Teammate Models
- (Dragan et al., 2023) Towards the decentralized coordination of multiple self-adaptive systems
- (Liu et al., 22 Nov 2025) DISPATCH -- Decentralized Informed Spatial Planning and Assignment of Tasks for Cooperative Heterogeneous Agents
- (Liu et al., 2022) Decentralized Coverage Path Planning with Reinforcement Learning and Dual Guidance
- (Wartnaby et al., 2019) Decentralised Cooperative Collision Avoidance with Reference-Free Model Predictive Control and Desired Versus Planned Trajectories
- (ÅženbaÅŸlar et al., 2023) RLSS: Real-time, Decentralized, Cooperative, Networkless Multi-Robot Trajectory Planning using Linear Spatial Separations
- (Yin et al., 1 Dec 2025) Visibility-aware Cooperative Aerial Tracking with Decentralized LiDAR-based Swarms
- (Goldman et al., 2011) Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis