PO-MAPF: Multi-Agent Pathfinding Under Uncertainty

Updated 1 December 2025

PO-MAPF is a framework that redefines multi-agent pathfinding by incorporating limited, local observations, requiring decentralized decision-making and collision avoidance.
It employs algorithmic strategies such as reactive policies, belief-space planning, and learning-based methods to manage uncertainty and sensor limitations.
Challenges in PO-MAPF include high computational complexity, ensuring safety under uncertainty, and balancing scalability with solution quality in dynamic environments.

Partially Observable Multi-Agent Pathfinding (PO-MAPF) generalizes the classical Multi-Agent Pathfinding (MAPF) problem by introducing limitations on agent sensing and perception, resulting in agents that plan and act based on local, partial information about their environment and other agents’ states. In PO-MAPF, agents typically share a common workspace (e.g., grids, graphs, or continuous domains) and must navigate from respective start to goal positions without collisions, but have only limited, local, or delayed observations of the global environment and other agents. This paradigm is motivated by practical constraints in robotics, autonomous transport, and distributed AI, where full global knowledge is generally unavailable in real time.

1. Formal Definition and Distinction from MAPF

Formally, classical MAPF is defined over a tuple (G, A, S, G) where G = (V, E) is the workspace graph, A is the set of agents, and S, G ⊆ V^|A| are the respective start and goal vertices for each agent. In classical MAPF, a central solver or the agents themselves assume global knowledge of G, start/goals, and (often) the precise locations of all agents at each timestep. Each agent selects actions synchronized over discrete time steps, subject to non-collision constraints:

Vertex conflict: No two agents occupy the same vertex at the same timestep.
Edge conflict: No two agents traverse the same edge in opposite directions simultaneously.

In PO-MAPF, each agent $i$ at time $t$ receives local observation $o_t^i$ drawn from a limited subset of the full system state, e.g., agents may only perceive nearby vertices or other agents within a sensor range, with possibly noisy or delayed information. The agents’ planning must then operate over belief states or histories, often requiring explicit reasoning under uncertainty about the dynamic obstacles posed by other agents. This generates a decentralized, partially observable multi-agent decision process, markedly harder than standard MAPF.

2. Key Models of Partial Observability

PO-MAPF models admit a variety of formal observation structures. Salient cases include:

k-radius observability: Each agent observes state information within $k$ hops or Euclidean distance from its current location.
Field-of-view/angle-constrained: Observability is limited not only by distance but also directional cone, modeling limited sensors in robotics.
Delayed/Asynchronous disclosure: Observed information is available only after a delay or at irregular time intervals, capturing communication or sensor update lags.
Noisy/inaccurate perception: Agent observations are stochastic functions of the true world state, inducing further uncertainty in belief space.

Variations in observability model drastically affect the computational and information-theoretic complexity of PO-MAPF, as well as the design of decentralized coordination and collision-avoidance protocols.

3. Algorithmic Approaches

Algorithmic solutions to PO-MAPF range from decentralized heuristics to explicit multi-agent belief-space search. Representative classes include:

Reactive Decentralized Policies: Agents follow precomputed or learned local policies, typically involving local priorities, rule-based collision-avoidance, or velocity obstacles tuned by sensory input. Such approaches scale well but may be incomplete or suboptimal in dense regimes.
Belief-State or History-Based Planning: Agents maintain an explicit or approximate belief over the global state (e.g., particle filter or set-valued state trackers), using techniques from Dec-POMDPs or decentralized planning under uncertainty. Optimally solving even the two-agent case is NEXP-complete in theory; scalable approximations involve receding-horizon planners with bounded lookahead or explicit anticipation of other agents’ strategies.
Learning-Based Approaches: Recent work considers deep reinforcement learning or imitation learning for distributed policy synthesis under agent-centric partial observability. Agents are trained either in simulation or through self-play to map local observations (often structured as grid images or graph neighborhoods) to low-level actions, typically relying on convolutional or attention-based neural architectures.
Hybrid Methods: Some approaches interleave explicit communication or coordination phases (broadcasting local maps, intentions, or predicted trajectories) with decentralized execution, leveraging partial observability for scalable runtime but occasional global consistency.

4. Computational Complexity and Hardness

The introduction of partial observability in MAPF escalates complexity in two directions: planning and decision-making must occur over high-dimensional joint belief states, and decentralized coordination must be robust to mistaken or outdated assumptions about other agents’ intent and position. For restricted variants (short-sightedness, finite agent counts, small graphs), problem instances often remain PSPACE- or EXPTIME-hard, with only approximate or heuristic algorithms feasible for larger domains. Most decentralized PO-MAPF algorithms are incomplete and may fail to avoid deadlock or livelock in adversarial configurations unless strong structural guarantees (e.g., acyclicity, strong mutual exclusion protocols) are present.

5. Empirical Evaluation and Application Domains

Benchmarking PO-MAPF is performed using gridworlds, real-world floor plans, and simulated multi-robot warehouses. Evaluations focus on performance metrics including total time to completion (makespan), sum-of-costs (total distance/time across agents), deadlock occurrence, and robustness to observation noise or communication latency. Empirical results indicate that, in high-density or adversarial scenarios, lack of full observability leads to substantial performance degradation and increased collision/oscillation rates unless communication or richer local prediction is used. Real-world application domains include:

Swarm robotic assembly under limited-range sensing (e.g., only on-board LiDAR or proximity sensors).
Autonomous vehicle navigation under V2V-limited environments.
Drone fleets with occlusion-limited cameras or delayed wireless mesh connectivity.

6. Relation to Broader Research in Multi-Agent Systems

The PO-MAPF setting connects tightly to Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs), distributed AI, and multi-robot motion planning under uncertainty. It is also related to distributed constraint optimization, invisible rendezvous/search, and collaborative exploration. Several algorithmic primitives (formation control, local priority fields, gossip-based map merging) have roots in adjacent fields. In practice, many PO-MAPF implementations hybridize model-based prediction and learning-driven execution to bridge the gap between completeness and real-time feasibility.

7. Key Limitations and Open Challenges

A central limitation of current PO-MAPF research is the tension between tractability and solution quality under partial observability, particularly as agent numbers and environment complexity grow. Open research questions include:

Scalability of explicit belief-space planners for large-scale, high-dimensional multi-agent systems.
Formal guarantees of safety (deadlock, collision avoidance) for reactive or learned decentralized policies.
Theoretical and empirical quantification of the “price of partial observability” versus communication (i.e., how much global coordination or communication bandwidth is required to achieve given solution quality).
Robustness to adversarial behavior or unmodeled environment disturbances.

Continued progress in PO-MAPF is closely tied to advances in multi-agent learning, distributed planning under uncertainty, and real-world deployment of large-scale multi-robot teams in partially observable, dynamic environments.

PDF Markdown Chat (Pro)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Partially Observable Multi-Agent Pathfinding (PO-MAPF).