Multi-Agent Pathfinding Algorithms

Updated 5 September 2025

Multi-Agent Pathfinding is the problem of computing collision-free, synchronized paths on a graph for multiple agents, ensuring safe concurrent movements.
It employs diverse methodologies including conflict-based search, continuous-time models, and learning techniques to optimize solution quality and scalability.
MAPF is widely applied in robotics, automated warehouses, and autonomous vehicles, addressing both theoretical challenges and real-world coordination.

Multi-Agent Pathfinding (MAPF) is the algorithmic problem of computing collision-free paths for multiple agents so that each agent reaches its assigned goal from its start location. The critical challenge is ensuring that all agents can execute their paths concurrently without colliding, under various assumptions on environment structure, agent constraints, time discretization, and objectives. MAPF is central in robotics, logistics, computer games, automated warehouses, autonomous vehicles, and many large-scale coordination domains.

1. Formal Foundations and Classical Problem Definition

Classical MAPF is defined as the tuple $\langle G, s, t \rangle$ , where $G = (V, E)$ is a (typically undirected) graph, $s: [1, \ldots, k] \rightarrow V$ and $t: [1, \ldots, k] \rightarrow V$ assign start and target vertices for each of $k$ agents. Time is discretized. Agents move synchronously through $G$ , choosing at each timestep either to wait at their current vertex or to advance along an adjacent edge. A plan for agent $i$ is a sequence $\pi_i$ of actions $a : V \rightarrow V$ such that $\pi_i[0] = s(i)$ , $\pi_i[|\pi_i|] = t(i)$ , and every $a(\cdot)$ is either the identity (wait) or moves along an edge.

The challenge emerges from enforcing non-collision constraints so that agents may execute these plans without interfering. Multiple conflict types have been formalized:

Vertex conflict: $\exists x: \pi_i[x] = \pi_j[x]$
Edge conflict: $\exists x: \pi_i[x]=\pi_j[x], \pi_i[x+1]=\pi_j[x+1]$
Following, cycle, and swapping conflicts: Variants involving agents following, cycling, or swapping vertices at the same time.

Additional modeling assumptions include whether agents "stay at target" or "disappear at target" upon arrival (with significant consequences for the cost objective), and whether time is uniformly discretized or continuous (Stern et al., 2019).

MAPF is typically NP-hard and remains intractable for many optimality objectives, mandating sophisticated exact or bounded-suboptimal algorithms for real-world domains.

2. Objectives, Variants, and Extensions

MAPF research has formalized diverse objectives and extensions (Stern et al., 2019):

Objective	Formula	Context
Makespan	$\max_{i=1}^k \|\pi_i\|$	Time until last agent
Sum-of-costs	$\sum_{i=1}^k \|\pi_i\|$	Collective efficiency

Classical MAPF can be extended in several directions:

Weighted Graphs: Edge traversal times are not uniform; e.g., diagonal moves cost $\sqrt{2}$ .
Large Agents and Kinematics: Agents may have non-point geometry, volume, or complex dynamics (e.g., orientation, minimum-radius turns). This necessitates geometric and temporal conflict checking as in car-like MAPF (Wen et al., 2020) and large-agent MAPF (Dergachev et al., 2022).
Continuous-Time and Any-Angle MAPF: Time may be modeled as a continuous variable, with agents executing actions of arbitrary, non-uniform durations and traversing arbitrary straight-line (any-angle) paths (Andreychuk et al., 2019, Yakovlev et al., 25 Apr 2024).
Dynamic and Online MAPF: Agents, start/goal pairs, and obstacles may appear on-the-fly, with replanning and reallocation requirements (Tang et al., 2023).
Movable Obstacles ("Terraforming"): Some agents can manipulate the environment by moving obstacles to relieve congestion (Vainshtein et al., 2022).

3. Algorithmic Methodologies

Conflict-Based Search (CBS) and Beyond:

CBS is a canonical optimal solver for classical MAPF. It performs a two-level search:

High-level: Maintains a constraint tree (CT), where each node corresponds to a set of forbidden agent-time vertex/edge occupancies.
Low-level: Computes single-agent shortest paths consistent with the accumulated constraints (typically via A* or variants).

CBS has been generalized:

Continuous-Time Conflict-Based Search (CCBS) (Andreychuk et al., 2019): Removes assumptions of discrete timesteps, unit-duration actions, and point agents. CCBS detects conflicts using geometry (disk agents), formulates them as $(a_i, t_i, a_j, t_j)$ , and introduces unsafe intervals (continuous forbidden time ranges per action). The low-level is adapted Safe Interval Path Planning (SIPP): per-agent, per-location plans over maximal safe time intervals.
Car-like CBS (CL-CBS) (Wen et al., 2020): Introduces a body conflict tree, works with kinematic constraints (steering angle, velocity bounds), and uses a hybrid-state A* planner in $(t, x, y, \theta)$ space.
Decentralized and Distributed Methods (Thomas et al., 2021, Ma et al., 2021): Agents may plan in decentralized or distributed settings, negotiating plans with local resource managers (as "Routers"), or via multi-agent learning with local communication (using graph convolution).
Online MAPF with Sustainable Information (Tang et al., 2023): Proactively reuses prior search context to allow rapid incremental replanning as agents/obstacles are introduced.
MAPF with Movable Obstacles ("Terraforming") (Vainshtein et al., 2022): Extends CBS/PBS with mover agents and additional constraints to prevent premature access to locations still blocked by movable obstacles.

For large or nonholonomic agents, reductions to "pebble motion" problems, edge-clearing routines, and kinematic simulation become central (Dergachev et al., 2022). Performance and completeness trade-offs become especially stark when dealing with complex geometry or large agent populations.

4. Learning-Based and Data-Driven Approaches

Recent work applies deep reinforcement learning, communication-aware neural planning, and imitation learning to MAPF:

Cooperative RL and Reward Shaping (Song et al., 15 Jul 2024): Independent Q-learning (IQL) is augmented with a cooperative reward term that reflects the expected benefit to neighbors through each agent's actions, decoupled via a local maximization, promoting cooperation despite decentralized execution.
Graph Transformers and Global Context (He et al., 2023, Liao et al., 10 Feb 2025): ALPHA uses graph transformers to fuse local grid observations with abstracted global graph features and short-term intention prediction. SIGMA introduces a sheaf-theoretic latent consensus mechanism, aligning latent representations ("stalks") of neighboring agents via self-supervised loss that enforces consistency, enabling decentralized but globally coordinated decision-making.
Pure Imitation Learning Foundation Models (Andreychuk et al., 29 Aug 2024, Andreychuk et al., 30 Jun 2025): MAPF-GPT is trained on millions of expert trajectories, tokenizing per-agent neighborhoods and context; the model operates non-autoregressively and demonstrates zero-shot generalization. MAPF-GPT-DDG introduces a delta-data generation mechanism for focused, active fine-tuning: problematic states with maximal solution cost increase along a trajectory are identified and relabeled with expert solutions, accelerating fine-tuning and improving both success rates and cost efficiency for massive scale (up to one million agents).

Empirically, such neural models rival or surpass earlier centralized planners for density and scale, especially when equipped with global context or consensus mechanisms.

5. Practical Benchmarks and Empirical Methodology

Benchmarking is facilitated by a suite of grid-based and domain-specific environments (Stern et al., 2019):

Open grids, maze-like, urban/city, warehouse, and game maps, with scenario generators for randomized source–target assignment and increasing agent counts.
Performance metrics: Fraction of instances solved (within a computational budget), average sum-of-costs (SOC), makespan, and computational time.
MAPF variants with large agents, movable obstacles, or dynamic/online conditions are evaluated for solution quality, throughput, adaptability, and computational scalability.
Distributed/cloud platforms: Decentralized and distributed algorithms exploit multi-core or multi-host execution, crucial for scalability in practical deployments.

Experiments repeatedly demonstrate trade-offs:

Solution quality vs. runtime: Exploiting continuous-time and any-angle motion (Andreychuk et al., 2019, Yakovlev et al., 25 Apr 2024) or learned cooperation (He et al., 2023, Andreychuk et al., 29 Aug 2024) yields lower SOC but may increase computation or require increased data/model/communication complexity.
Completeness vs. speed: Decentralized and distributed methods (Thomas et al., 2021, Ma et al., 2021) offer fast convergence at the expense of solution optimality.
Heuristic acceleration: Conflict-based planners benefit from hybrid cardinal/past-conflicts heuristics and multi-constraint pruning to reduce the search tree expansion (notably necessary in continuous and any-angle settings (Yakovlev et al., 25 Apr 2024)).

6. Applications and Advanced Variants

MAPF is integral to domains such as:

Automated warehouses: Coordinated fleets of robots must avoid collisions in dense, obstacle-rich layouts (Stern et al., 2019, Vainshtein et al., 2022).
Autonomous driving and urban mobility: Coordination for nonholonomic, car-like agents with realistic physical constraints (Wen et al., 2020).
Large-scale logistics and rescue: Planning for hundreds to millions of agents in real-time, both in simulation and robotic field deployments (Andreychuk et al., 30 Jun 2025).
Exploration and adaptive sampling: Information-driven MAPF formulations optimize both coverage and information gain, framed as multi-agent POMDPs with mutual information-based heuristics and dynamic distributed planning based on communication proximity (Olkin et al., 19 Sep 2024).
Quantum-classical and diffusion-based methods: Quantum annealing subroutines address combinatorial bottlenecks in branch-and-cut-and-price hybrid methods (Gerlach et al., 24 Jan 2025), while projected diffusion models generate continuous-space, constraint-satisfying trajectories via a combination of sampling and augmented Lagrangian projection (Liang et al., 23 Dec 2024).

7. Open Problems, Challenges, and Future Directions

Key research directions and open challenges in MAPF include:

Optimality vs. Scalability: Achieving optimal or bounded suboptimal solutions on very large or continuous domains remains challenging; hybridizations (e.g., bounded-suboptimal algorithms, incremental replanning, multi-constraint pruning) help bridge the gap (Yakovlev et al., 25 Apr 2024).
Expressivity for Realistic Agents: Extending methods to nonholonomic, high-dimensional, continuous, or deformable agents, and integrating complex task constraints.
Integrating Learning and Combinatorial Search: Combining high-capacity learned policies with search-based or optimization-based planning for robustness and adaptability.
Information Gathering and Lifelong MAPF: Jointly optimizing for collision-free execution and multi-agent active sensing, especially under limited and dynamic communication (Olkin et al., 19 Sep 2024).
Distributed, Decentralized, and Explainable Solvers: Developing practical, scalable algorithms that operate with minimal centralized communication and provide explainable feedback and plan revision capabilities (Bogatarkan, 2021, Tang et al., 2023).

Comprehensive MAPF research now spans foundational combinatorial algorithms, learning and optimization-based techniques, and application-driven variants. Ongoing progress continues to drive both theoretical advancements and practical deployment in increasingly complex, real-world multi-agent environments.