Papers
Topics
Authors
Recent
Search
2000 character limit reached

Decentralized Cooperative Planning

Updated 4 February 2026
  • Decentralized cooperative planning is a framework where autonomous agents optimize joint objectives using local communication and distributed decision-making.
  • The approach employs methods like ADMM, iLQR, and MCTS to solve nonlinear dynamics and enforce global constraints such as collision avoidance in real time.
  • Key applications include multi-robot systems and connected vehicles, demonstrating scalable, robust performance through consensus protocols and reinforcement learning.

Decentralized cooperative planning refers to the class of methods and frameworks in which a team of autonomous agents (e.g., robots, vehicles, software systems) coordinate and jointly optimize their individual plans to achieve shared performance objectives, without reliance on a central decision-maker. Agents interact via local communication, direct modeling of others, or consensus protocols, and must reason about both local goals and global constraints such as collision avoidance or resource sharing. Research in this domain spans nonlinear optimal control, distributed optimization, Monte Carlo tree search, constraint programming, and learning-based approaches, with strong emphasis on real-time scalability, robustness, and information privacy.

1. Formal Problem Setting and Key Objectives

Decentralized cooperative planning is generally formulated as a global optimization problem constrained by distributed dynamics and safety constraints. Consider NN agents, each with discrete-time nonlinear dynamics

xt+1i=f(xti,uti),x^i_{t+1} = f(x^i_t, u^i_t),

where xtix^i_t is the local state, utiu^i_t the control input. The joint objective typically takes the form

J(U)=∑t=0T−1[∑i=1NCti(xti,uti)+∑i<jCtij(xti,xtj)]+terminal terms,J(U) = \sum_{t=0}^{T-1}\Bigg[ \sum_{i=1}^N C^i_t(x^i_t, u^i_t) + \sum_{i<j}C^{ij}_t(x^i_t, x^j_t) \Bigg] + \text{terminal terms},

where CtiC^i_t penalizes tracking error and control, and CtijC^{ij}_t encodes pairwise constraints such as collision avoidance or cooperation bonuses. Agents are typically subject to local constraints uti∈[u‾ti,u‾ti]u^i_t \in [\underline{u}^i_t, \overline{u}^i_t] and must maintain feasibility in the joint state space.

The core challenge is to realize a solution method or protocol that:

  • Allows all agents to iteratively solve for their components of UU, accounting for the impact of others' choices.
  • Scales gracefully with NN, achieving near real-time performance for large teams.
  • Preserves privacy or autonomy (often, only neighbor-to-neighbor state or intent exchange is allowed).
  • Ensures global constraints (e.g., safety) are respected—typically requiring consensus, dual decomposition, or explicit coordination steps.

2. Distributed and Consensus-Based Optimization Frameworks

One prominent class of methods decomposes the planning problem into per-agent subproblems coordinated via distributed convex optimization, typically by alternating direction methods of multipliers (ADMM) or consensus-based dual decomposition.

Consider the decentralized iLQR/consensus-ADMM approach for connected autonomous vehicles (Huang et al., 2023). The nonlinear and nonconvex global problem is linearized about a nominal trajectory, yielding convex quadratic subproblems per agent. The key steps are:

  • Local Linearization: Each agent linearizes its own dynamics and the relevant pairwise constraints at each iteration.
  • Dual Consensus Structure: The resulting subproblems admit a consensus constraint of the form ∑i=1NJiδXi=w\sum_{i=1}^N J^i\delta X^i = w where JiJ^i are sparse coupling matrices and ww is auxiliary; this induces a dual problem with a consensus variable yy.
  • Consensus ADMM: Agents exchange local copies of yy and update dual and primal variables iteratively, using only neighbor-to-neighbor broadcast, typically in a complete communication graph. ADMM guarantees global constraint enforcement when agents reach consensus.
  • Primal Recovery and Line Search: Each agent recovers the local trajectory update by solving a time-varying LQR, and a synchronous iLQR-style forward pass with line-search ensures satisfaction of the true nonlinear dynamics and box constraints.

Fundamentally, this approach enables O(1)O(1) per-agent complexity per major iteration, in contrast to O(N3)O(N^3) scaling of fully centralized iLQR. Empirically, for scenarios with up to N=12N = 12 vehicles, decentralized ADMM reaches convergence in $0.25$ seconds for a 100-step horizon, outperforming Interior Point and SQP baselines by large factors (Huang et al., 2023).

3. Monte Carlo Tree Search and Implicit Coordination

An alternative line of research centers on decentralized Monte Carlo Tree Search (MCTS), often with models of teammates and macro-action abstraction to enable implicit cooperation (Kurzer et al., 2018, Czechowski et al., 2020, Kurzer et al., 2018). The formalism typically involves:

  • Decentralized (Semi-)Markov Decision Process: Each agent maintains its own policy, modeling others through heuristic or learned approximations.
  • Tree Search: Each agent recursively simulates possible action sequences, either sampling the actions of others (Decoupled-UCT) or leveraging macro-level option hierarchies to reduce search complexity.
  • Cooperative Reward Shaping: Each agent's return Ricoop=ri+λ∑j≠irjR^{\text{coop}}_i = r_i + \lambda \sum_{j \ne i} r_j explicitly incentivizes behavior that is globally beneficial, not just egoistic.
  • Teammate Modeling: Recent work trains deep models (e.g., CNNs or neural nets) to approximate the policies of teammates, yielding improved best-response policies and convergence to Nash equilibria in the decentralized planning game (Czechowski et al., 2020).

Hierarchical or macro-action schemes further reduce effective branching factors and enable long-horizon planning with manageable computation. Empirical results in multi-robot and traffic simulation domains show that decentralized MCTS with macro-actions and decoupled backups enables agents to discover cooperative behaviors—such as merging, yielding, and overtaking—even without explicit communication, yielding solutions unattainable through purely egoistic planning (Kurzer et al., 2018, Kurzer et al., 2018).

4. Cooperative Task and Resource Allocation via Decentralized Constraints

Decentralized cooperative planning also encompasses task assignment, resource scheduling, and adaptation in distributed systems. A significant body of research models such problems as distributed constraint optimization problems (DCOPs) (Dragan et al., 2023, Liu et al., 22 Nov 2025). The formal DCOP framework models:

  • Agents as decision-makers choosing from discrete variable assignments.
  • Preference constraints capturing local utility/cost and consistency constraints capturing global or pairwise coupling (e.g., conflicts or synergies).
  • Distributed algorithms (e.g., DPOP) that operate over pseudo-trees, propagating UTIL/VALUE messages for coordinated optimization, with guaranteed convergence for acyclic coordination graphs.

The CoADAPT framework expresses decentralized coordination for self-adaptive systems (e.g., cloud computing resources) as a DCOP, balancing each agent’s local preferences and jointly enforced consistency constraints. Empirical studies highlight robust adaptation quality, scalability to n≈50n\approx 50 agents, and moderate coordination overhead (linear message size growth for acyclic topology) (Dragan et al., 2023). Similarly, fair and efficient task-allocation under partial observability can be driven by consensus over spatially aware equilibrium programs (e.g., Eisenberg-Gale) implemented in decentralized MARL or batched assignment (Liu et al., 22 Nov 2025).

5. Leaderless Protocols and Real-Time Distributed MPC

Several notable schemes avoid explicit leader election or negotiation, relying instead on protocol-level soft coordination and emergent behavior. The desired/planned trajectory approach (Wartnaby et al., 2019) allows each vehicle to broadcast both a "wish" (desired) and a "commitment" (planned) trajectory at each MPC iteration.

  • Other agents weakly avoid "desired" trajectories and strongly avoid "planned" trajectories of peers.
  • This dynamic adjustment causes would-be conflicting intents to be resolved: agents yield to each other's wishes in a self-organizing fashion.
  • No agent is a leader; all collaborate asynchronously, and cooperation emerges purely through repeated mutual adjustment.
  • This protocol achieves linear per-agent complexity and scales to arbitrary team sizes provided each agent can broadcast to its relevant neighbors.

Such schemes have demonstrated real-time viability (control updates every 40ms, horizon 23 steps, per-car trajectory solves ≈ 100ms on standard CPUs) and are empirically robust, with no collisions or deadlocks across a range of complex urban traffic scenarios (Wartnaby et al., 2019).

6. Reinforcement Learning and Distributed Coverage/Safety

Recent work extends decentralized planning to coverage, tracking, and path assignment via distributed reinforcement learning and geometry-aware constraints (Liu et al., 2022, ÅženbaÅŸlar et al., 2023, Yin et al., 1 Dec 2025).

  • Dual Guidance RL: DODGE combines artificial potential field encodings of other agents’ intent with local heuristic search, learning decentralized coverage policies that minimize overlap and maintain balanced subarea allocation, with overlap rates as low as 0.1%0.1\% for teams of up to $20$ robots (Liu et al., 2022).
  • Hard-constrained Trajectory Optimization: RLSS models each robot’s constraint surface (workspace, robot–robot, robot–obstacle) as a set of half-space separations, yielding a per-robot convex QP. No explicit communication or synchronization is required; collision avoidance and deadlock-free operation are guaranteed as long as problem feasibility holds at each step. Real-time performance at 5–10Hz was demonstrated for teams of up to $32$ robots (ÅženbaÅŸlar et al., 2023).
  • Hierarchical Visibility/Coordination: Decentralized LiDAR-based aerial tracking frameworks employ spherical signed distance field occlusion metrics, field-of-view costs, and swarm potential-based spatial distribution; all modules are differentiable and run fully decentralized with per-drone peer-to-peer state exchange (Yin et al., 1 Dec 2025).

These methods further highlight the breadth of decentralized cooperative planning, from spatial coverage to pursuit-evasion and surveillance.

7. Theoretical Complexity and Practical Algorithmics

Theoretical analysis shows that decentralized cooperative planning can range from NEXP-complete (general Dec-POMDPs) to polynomial-time in special cases (independent transitions and observations, single goal, no benefit to changing goal) (Goldman et al., 2011). Structural properties such as local full observability, acyclic interaction, or independence among agents can make polynomial-time optimal planning feasible. In practice:

  • Scalability is achieved through decomposition (subgraphs, local coordination), exploiting sparsity in collision or task graphs, and by leveraging macro-action or abstraction mechanisms.
  • Communication and information sharing rarely improve worst-case complexity but are critical for empirical performance and practical coordination in dynamic or open environments.

Algorithmic choices—distributed message passing, ADMM, asynchrony, decentralized prioritized planning—are tailored to problem structure and operational demands (e.g., real-time constraints, robustness to failures, partial observability).


References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Decentralized Cooperative Planning.