Multi-Agent Task Planning

Updated 16 February 2026

Multi-agent task planning is the synthesis of coordinated plans across multiple agents with diverse capabilities to achieve global objectives.
It employs centralized, decentralized, and hierarchical methodologies to integrate individual actions into globally consistent and resource-efficient plans.
Recent advances leverage temporal logics, LLM-driven decomposition, and real-time adaptation techniques to enhance scalability and performance.

Multi-agent task planning is the study and practice of synthesizing coordinated plans across multiple agents—autonomous entities with localized actions and potentially partial knowledge or differing capabilities—to achieve complex, often temporally extended objectives defined over the team or environment. This field encompasses formal models, algorithmic paradigms, and applied systems for integrating individual agent actions into globally consistent, resource-efficient, and often provably correct executions, with applications spanning robotics, logistics, service systems, and distributed AI.

1. Formal Models for Multi-Agent Task Planning

Foundational models for multi-agent task planning extend the classical STRIPS or SAS⁺ planning formalisms to n agents, each typically endowed with a local action set, partial or full observability of global states, and potentially unique private state variables (Torreño et al., 2017). The canonical MA-STRIPS tuple is

$\mathcal{M} = (A, S_0, G, \mathrm{Agents})$

where $A$ is the set of all grounded actions (split per agent), $S_0$ the initial global state, $G$ the global goal, and $\mathrm{Agents} = \{a_1, \ldots, a_N\}$ the team.

Key distinguishing dimensions include:

Individual vs. joint action spaces: Agents may act independently or via joint actions, subject to concurrency rules and conflict constraints (Torreño et al., 2017, Bai et al., 2024).
Task models: Objectives can be finite (reachability goals), infinite (LTL formulas), or optimization-driven (minimizing makespan or energy) (Tumova et al., 2014, Kedia et al., 2022).
Dynamic capabilities: Heterogeneity arises from differing action sets, domain knowledge, or resource models (Zhang et al., 2014, Tziola et al., 2022).

When temporal or logical task specifications are needed, goals are encoded in LTL or variants such as time-window temporal logic (TWTL); the problem then requires generating runs over agent transition models that satisfy these global constraints (Tumova et al., 2014, Liu et al., 2023, Liu et al., 2022, Peterson et al., 2020).

2. Centralized, Decentralized, and Hierarchical Planning Paradigms

Multi-agent task planning methods vary according to control architecture, coordination protocol, and granularity of decomposition.

Centralized Planning

Centralized approaches amalgamate all agent models and objectives into a global planning instance—typically by constructing a synchronous product transition system—and solve for a single joint plan, later extracting policies per agent (Tumova et al., 2016, Torreño et al., 2017). Centralized LTL planning entails:

Transition system product: $\prod_i T_i$ , each agent’s transition model.
Automata-based synthesis: Cross with Büchi automata for LTL or NBA for sc-LTL, yielding exponential or doubly-exponential state space (Tumova et al., 2014, Tumova et al., 2016, Liu et al., 2023).

Though conceptually simple, this quickly becomes intractable as agent count or task complexity grows.

Decentralized and Decomposed Planning

To combat state-space explosion, decomposition paradigms decompose the planning process by:

Motion/task decomposition: Decouple independent (often local) motion constraints from global task/collaboration constraints (two-phase automata reduction) (Tumova et al., 2016).
Receding-horizon/iterative local synthesis: Solve for locally dependent subsets of agents/clusters over short horizons, execute, then repeat (Tumova et al., 2014, Peterson et al., 2020).
Formal poset-based decomposition: Represent system-wide LTL requirements as the union or product of partial orders over subtasks (R-posets), merging online as new requirements arrive (Liu et al., 2023).

Decentralized methods also leverage local communication, energy-based priorities, and neighborhood synchronization to guarantee global goal satisfaction while enabling real-time adaptation (Peterson et al., 2020, Nguyen et al., 2024).

Hierarchical and Hybrid Methods

Hierarchical frameworks further refine the architecture by explicitly separating symbolic mission/goal decomposition from low-level motion or execution planning, using formal assume-guarantee rules for correctness and safety (Silva et al., 2016, Faroni et al., 2023). The connection between symbolic plans and executable actions is established via abstraction mappings, collating durations, resource constraints, and obstacle information.

A single integrated TAMP (Task and Motion Planning) pipeline maintains bidirectional feedback, enabling high-level mission reconfiguration in response to geometric infeasibility or environmental changes detected during motion planning (Faroni et al., 2023).

3. Temporal Logic, Partial Orders, and LLM-driven Decomposition

Temporal logic specifications (LTL, TWTL, sc-LTL) are increasingly standard for expressing multi-agent objectives involving timing, sequencing, and collaboration (Tumova et al., 2014, Liu et al., 2023, Liu et al., 2022, Peterson et al., 2020). Their algorithmic treatment involves several innovations:

Local and global LTL formulas: Local agent goals impose inter-agent dependencies via “service sets”; global LTL may require partial- or full-task automaton products (Tumova et al., 2014, Tumova et al., 2016).
Partial order reductions and R-poset products: Factoring LTL into subtasks and encoding order/conflict constraints as partially-ordered sets (R-posets), supports polynomial-time online plan synthesis for large systems (Liu et al., 2023, Liu et al., 2022).
LLM-driven decomposition: Recent frameworks (e.g., TAPAS, LaMMA-P, SMART-LLM) use LLMs to decompose tasks, identify preconditions/effects, allocate agents, and generate PDDL or action scripts, often supplementing or bypassing classic symbolic planners (Babu et al., 24 Jun 2025, Zhang et al., 2024, Kannan et al., 2023, Bai et al., 2024).

Notably, the hybridization of LLM reasoning with symbolic planning enables zero-shot adaptation to new constraints, rapid prototyping, and efficient plan execution in realistic simulation and real-robot settings (Babu et al., 24 Jun 2025).

4. Core Algorithmic Techniques and Complexity

Algorithmic solutions include:

Automata product and search: Cross agent transition systems with global task automata and search for accepting executions. Classic approaches suffer from exponential complexity (Tumova et al., 2014, Tumova et al., 2016).
Partial order/BnB assignment: Extract partial orderings from automata to relax synchronization and concurrency constraints, then assign subtasks to agents using assignment or branch-and-bound heuristics for makespan or cost minimization (Liu et al., 2022).
On-the-fly decomposition and assignment: R-poset methods build global task orders incrementally as new requirements arrive, and assign feasible agent coalitions via greedy or contract-net protocols, yielding polynomial-time first-valid plans (Liu et al., 2023).
Distributed MCTS and learning-based coordination: Decentralized (e.g., A-MCTS) methods use Monte Carlo tree search and regret matching, synchronizing agent policies via exchanged “intents” and adapting in real-time to agent failure or dynamic topology (Nguyen et al., 2024).

Theoretical properties established include completeness, soundness, and optimality guarantees under standard assumptions for each class of algorithm. For poset-based and receding-horizon planners, the complexity per iteration is polynomial in team size and task length, in contrast to the intractability of full-state product methods (Liu et al., 2023, Tumova et al., 2014). TAMP and hybrid symbolic–geometric planners often separate NP-hard combinatorial assignment from locally tractable motion graph search, enabling scalable real-world applications (Faroni et al., 2023).

5. Task Allocation, Synchronization, and Ordering Constraints

Task allocation in multi-agent planning must account for agent heterogeneity, capability, synchronization needs, and explicit resource constraints (Zhang et al., 2014, Torreño et al., 2017, Tziola et al., 2022).

Required cooperation analysis: Necessary and sufficient structural conditions for unavoidable cooperation have been formally characterized, including domain, variable, and capability heterogeneity as well as causal loops and state-space traversability (for homogeneous agents) (Zhang et al., 2014).
Goal decomposition and assignment mechanisms: Approaches include goal—relaxed reachability partitioning (Torreño et al., 2017), explicit LLM-driven subgoal allocation (Babu et al., 24 Jun 2025, Zhang et al., 2024), contract-net protocols, or auction-based role assignment.
Synchronization and ordering: Some problems entail precedence constraints (PC-MAPF), temporal ordering (as in LTL or partial orders), or collaborative actions that require simultaneous agent attendance at certain states (Kedia et al., 2022, Liu et al., 2023, Liu et al., 2022).

Recent research extends the classical focus on concurrency and conflict avoidance to address strict ordering constraints on objective function values for individual paths—e.g., in security-aware planning where agent cost orderings are prioritized (Ye et al., 2024).

6. Scalability, Adaptability, and Empirical Performance

Leading multi-agent planning methods have demonstrated:

Scaling to fleets of hundreds of robots and task automata of hundreds of clauses via R-poset products and adaptive partial order decomposition (Liu et al., 2023, Liu et al., 2022).
Robust online adaptation to agent failures, uncertain timing, and dynamic task arrivals—via decentralized receding-horizon policies, event-based synchronization, and continual plan re-synthesis (Peterson et al., 2020, Liu et al., 2022, Nguyen et al., 2024).
High performance on real-world and simulated benchmarks: e.g., LaMMA-P achieves a 105% higher success rate than prior LLM-based planners on long-horizon MAT-THOR tasks (Zhang et al., 2024); TAPAS and SMART-LLM demonstrate strong solvability and adaptation in simulated and real robot teams even as constraints and skills change at runtime (Babu et al., 24 Jun 2025, Kannan et al., 2023).

Quantitative metrics such as success rate, makespan, plan cost, token consumption (for LLM methods), and robot utilization are used for systematic benchmarking (Ling et al., 29 Sep 2025, Zhang et al., 2024, Kannan et al., 2023). Ablation studies confirm the criticality of decomposition, correctness checks, dynamic reallocation, and learning components in maintaining system performance under complexity and uncertainty.

7. Open Challenges and Future Directions

The field continues to advance towards:

Dynamic, partially observable, and uncertain settings: Development of frameworks robust to agent attrition, network loss, and perception failures (Nguyen et al., 2024).
Integrating vision, language, and symbolic reasoning: Incorporating multi-modal input and learning world model representations for end-to-end task planning (Zhang et al., 2024, Babu et al., 24 Jun 2025).
Automated decomposition and learning: Learning cost-optimal decompositions, allocations, and synchronization schemes as part of adaptive pipelines (Li et al., 2024, Zhang et al., 2024).
Scalability and compositional formal guarantees: Pursuing formal correctness over high-dimensional, dynamic, or open-world domains using decomposable logic, compositional proofs, and scalable online adaptation (Liu et al., 2023, Silva et al., 2016).
Human–agent teaming and required cooperation diagnosis: Exploiting formal characterizations to inform human–robot collaboration, resize teams, or recommend replanning in failure cases (Zhang et al., 2014, Faroni et al., 2023).

Multi-agent task planning remains a rapidly evolving area interlinking algorithmic, formal, and empirical research, with growing impact across AI, robotics, and multi-agent systems.