Decentralized Assignment & Navigation (DAN)

Updated 22 May 2026

Decentralized Assignment and Navigation (DAN) is a framework for distributed task allocation and navigation that enables multi-agent systems to operate without a central authority.
It integrates methods like multi-agent reinforcement learning, stochastic optimization, and transformer-based coordination to achieve fair, efficient, and scalable performance.
DAN systems enhance robustness in dynamic environments by using local observations and decentralized communication to reduce travel time and avoid collisions.

Decentralized Assignment and Navigation (DAN) encompasses a family of frameworks and algorithms designed to jointly solve multi-agent task/goal assignment and navigation in the absence of a central authority, relying instead on distributed observation, communication, and coordination protocols. DAN has become a core paradigm for cooperative robot teams in spatially distributed and dynamic environments, addressing efficiency, fairness, robustness, and scalability in assignment and navigation, even under partial observability and agent heterogeneity.

1. Problem Formulation and Theoretical Foundations

The canonical DAN problem models a team of $N$ agents $A = \{1, \ldots, N\}$ deployed in a shared workspace, tasked with servicing or reaching a set of $M$ mission elements (tasks or goals), $T = \{1, \ldots, M\}$ , potentially under various assignment constraints (e.g., one-to-one, many-to-one). Each task may have structured parameters: weight $w_j$ , initial workload $W_j$ , spatial location, and agent–task preference $pref_{ji}$ reflecting skill alignment, utility, or compatibility. Agents may be heterogeneous in their dynamics, sensing, and communication ranges.

Decentralized assignment and navigation is typically formalized as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP), where:

The global state $s\in S$ encodes agents’ and tasks’ positions, velocities, workloads, and obstacles.
Each agent $i$ receives a partial observation $o^{(i)}$ and communicates locally within a network graph (neighborhood defined by range or connectivity).
Joint actions are selected as $A = \{1, \ldots, N\}$ 0, with underlying transition dynamics $A = \{1, \ldots, N\}$ 1.
The reward structure encodes objectives such as minimizing completion time, total travel cost, and/or enforcing balanced team performance and fairness.

For fairness-aware assignment, the Eisenberg–Gale (EG) equilibrium program is utilized as a guiding principle. The centralized EG convex program is:

$A = \{1, \ldots, N\}$ 2

where $A = \{1, \ldots, N\}$ 3 incorporates both agent–task compatibility and travel cost via an exponential distance discount ( $A = \{1, \ldots, N\}$ 4).

This formulation is Pareto-efficient and envy-free (weighted by $A = \{1, \ldots, N\}$ 5), providing a rigorous theoretical benchmark for both centralized and decentralized assignment mechanisms (Liu et al., 22 Nov 2025).

2. Algorithmic Architectures

DAN systems instantiate diverse algorithmic pipelines, commonly structured into higher-level assignment and lower-level navigation layers:

Multi-Agent RL Approaches:

DISPATCH/EG-MARL leverages a centralized training with decentralized execution (CTDE) architecture, where decentralized agent policies are GNN-based and trained under actor–critic schemes fitted with an EG-based reward shaping (Liu et al., 22 Nov 2025). Each agent’s policy $A = \{1, \ldots, N\}$ 6 consumes local egocentric graphs and aims to maximize long-horizon team reward.

Stochastic Online Optimization:

DISPATCH employs an exploration–assignment loop, maintaining sets of discovered but unassigned tasks and assigning agents via local EG maximization over $A = \{1, \ldots, N\}$ 7-sized subsets, offering a scalable real-time approximation to the full EG matching (Liu et al., 22 Nov 2025).

Hierarchical RL Coupling:

DC-MRTA (Agrawal et al., 2022) adopts a two-level system, solving the assignment via episodic RL over an MDP with state space comprising both robot positions/busy-times and a dynamic task pool. Rewards directly reflect lower-level decentralized navigation cost (e.g., total travel delay), as obtained by ORCA-based collision-avoidance trajectories.

Local Exchange and Swapping:

In decentralized unlabeled multi-agent navigation, agents individually select and periodically exchange goals, utilizing criterions such as minimizing global path cost or resolving conflicts via priority-based or cost-reducing swaps (Dergachev et al., 2024). Collision-avoidance is enforced via decentralized velocity-obstacle-based planners (e.g., ORCA).

Transformer-Based Neural Coordination:

For MinMax multi-agent routing, a decentralized attention neural network (DAN) architecture parameterizes agent policies by encoding the evolving assignment state, agent configurations, and dynamic city context, using self- and cross-attention mechanisms to enable implicit agent–agent coordination without explicit communication (Cao et al., 2021).

3. Communication and Decentralization Mechanisms

Strict decentralization in DAN prohibits global map aggregation or synchronized shared state at runtime:

Local Observations: Each agent perceives entities within its radius, with sensory and semantic maps updated via onboard perception.
Sparse Messaging: Communication is typically limited to local, relay-based neighbor messaging, encoding positions, current assignments, or intents.
Ad-hoc Exchanges and Intent Broadcasting: Agents broadcast navigation or task-seeking intents, and in some systems (e.g., DM $A = \{1, \ldots, N\}$ 8-Nav (Kashiri et al., 23 Apr 2026)) immediately prune or reprioritize their own plans upon discovering collisions or redundant intents from neighbors.
Map Fusion: In semantic navigation, partial local maps may be merged opportunistically via visual/semantic keypoint matching, but only on a pairwise, unsynchronized basis.

The principled reliance on local information and opportunistic communication constrains each robot’s computational and bandwidth requirements to its neighborhood, enhancing robustness, scalability, and eliminating single points of failure.

4. Fairness, Efficiency, and Performance Trade-offs

DAN frameworks explicitly quantify the trade-off among fairness (e.g., equitable service or resource allocation), efficiency (e.g., total distance, makespan), and the computational or communication cost of decentralization.

Fairness Metrics: Distributional fairness is measured via the coefficient of variation of per-task attained utility, $A = \{1, \ldots, N\}$ 9, and Jain’s index.
Efficiency Metrics: Standard metrics include total task completion time $M$ 0, total travel distance $M$ 1, flowtime, and makespan.
Empirical Regret: Regret to the centralized equilibrium is defined as $M$ 2, comparing achieved utility to the EG optimum (Liu et al., 22 Nov 2025).

In controlled evaluations:

Agents	EG-MARL Regret	Online Regret
3	0.08	2.22
7	3.99	0.97
10	6.20	3.70

EG-MARL yields near-centralized performance for small $M$ 3 but as $M$ 4 increases, the online stochastic method overtakes in matching centralized benchmarks. Notably, EG-based DAN methods dominate the fairness–efficiency Pareto frontier over utility-only (Hungarian) or cost-only (Min–Max) baselines, confirming that enforcing log-concave utility in assignment supports both objectives robustly (Liu et al., 22 Nov 2025).

Low-level navigation is integrated tightly within decentralized assignment routines:

End-to-End Learned Navigation: In EG-MARL, agents learn end-to-end acceleration policies for collision-free goal pursuit within the RL framework itself, shaped by distance-to-goal and progress signals, without recourse to explicit classical planners (Liu et al., 22 Nov 2025).
Decentralized Velocity Obstacle Planners: ORCA is the default decentralized path planner, guaranteeing mutual collision avoidance over short horizons, and is employed both as a deployable controller and as a “navigation oracle” for assigning costs or rewards during assignment (Agrawal et al., 2022, Dergachev et al., 2024).
Implicit Routing: In semantic DAN (e.g., DM $M$ 5-Nav (Kashiri et al., 23 Apr 2026)), navigation combines frontier-based exploration, instance memory, and intent coordination processes, with each robot managing its own planning modules and map fusion as needed.

6. Extensions, Heterogeneity, and Robustness

Modern DAN systems support a broad class of extensions:

Heterogeneous Capabilities: Algorithms integrate agent-specific skill matrices (e.g., $M$ 6, manipulation/sensing abilities) directly into assignment utilities or through flexible skill-aware protocol generation (as in LLM-generated policies (Rajvanshi et al., 19 May 2025)).
Open-World, Multi-Goal Missions: Frameworks such as DM $M$ 7-Nav and SayCoNav handle multi-object, multimodal, and open-vocabulary goal formulations, leveraging learned semantic matching, language/image embeddings, and adaptive replanning (Kashiri et al., 23 Apr 2026, Rajvanshi et al., 19 May 2025). Strategy adaptation occurs online in response to failures or dynamic changes via replanning or dynamic prompt conditions.
Completeness Guarantees: Decentralized goal assignment procedures provide formal guarantees of termination and consistency under minimal progress assumptions, ensuring all agents eventually attain unique goals or detect infeasibility (Dergachev et al., 2024).
Robustness and Failure Modes: DAN systems operate without global consensus; local perception and communication failures may cause temporary redundant exploration or suboptimal allocation but do not catastrophically degrade global task coverage (Kashiri et al., 23 Apr 2026).

7. Empirical Evaluation and Comparative Results

DAN approaches have been subjected to rigorous evaluation across simulation and real-world scenarios (Liu et al., 22 Nov 2025, Agrawal et al., 2022, Dergachev et al., 2024, Kashiri et al., 23 Apr 2026, Rajvanshi et al., 19 May 2025, Cao et al., 2021):

DISPATCH/EG-MARL achieves near-Pareto optimum in both fairness and travel time up to moderate team sizes, while stochastic online DAN approaches retain scalability and competitive fairness for larger teams (Liu et al., 22 Nov 2025).
DC-MRTA reduces completion time by up to 14% and collisions by 40% compared to standard baselines in warehouse maps, remaining scalable up to 1000 robots (Agrawal et al., 2022).
Decentralized Unlabeled Navigation closes the gap with centralized algorithms in both success rate and path efficiency across random and structured maps. For $M$ 8, DAN solves 80–95% of instances, well above pure decentralized baselines (Dergachev et al., 2024).
DM $M$ 9-Nav matches or exceeds centralized semantic navigation approaches in multi-object, open-vocabulary settings, maintaining positive team SPL and MSPL metrics and robust real-world performance (Kashiri et al., 23 Apr 2026).
SayCoNav demonstrates up to a 44% reduction in search time and robust adaptive collaboration in the presence of agent failures by leveraging LLM-synthesized decentralized strategies and dynamic local planning (Rajvanshi et al., 19 May 2025).
DAN for MinMax mTSP approaches or outperforms centralized solvers and transformer-based approaches in both solution quality and planning speed for large-scale routing problems with up to 1000 nodes (Cao et al., 2021).

References

(Liu et al., 22 Nov 2025) "DISPATCH -- Decentralized Informed Spatial Planning and Assignment of Tasks for Cooperative Heterogeneous Agents"
(Agrawal et al., 2022) "DC-MRTA: Decentralized Multi-Robot Task Allocation and Navigation in Complex Environments"
(Dergachev et al., 2024) "Decentralized Unlabeled Multi-Agent Navigation in Continuous Space"
(Kashiri et al., 23 Apr 2026) "DM $T = \{1, \ldots, M\}$ 0-Nav: Decentralized Multi-Agent Multimodal Multi-Object Semantic Navigation"
(Rajvanshi et al., 19 May 2025) "SayCoNav: Utilizing LLMs for Adaptive Collaboration in Decentralized Multi-Robot Navigation"
(Cao et al., 2021) "DAN: Decentralized Attention-based Neural Network for the MinMax Multiple Traveling Salesman Problem"