MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs (1207.1359v1)

Published 4 Jul 2012 in cs.AI

Abstract: We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DEC-POMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multirobot coordination, network traffic control, `or distributed resource allocation. Solving such problems efiectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA* has significant advantages. We introduce an anytime variant of MAA* and conclude with a discussion of promising extensions such as an approach to solving infinite horizon problems.

Citations (217)

View on Semantic Scholar

Summary

The paper presents a novel heuristic search algorithm, MAA*, that computes optimal solutions for finite-horizon DEC-POMDPs.
The methodology adapts the classical A* approach to evaluate complete sets of agent policies using state-dependent upper bound estimations.
Experimental results on benchmark problems show that MAA* reduces memory requirements and scales better than dynamic programming approaches.

An Evaluation of MAA*: A Heuristic Search Algorithm for DEC-POMDPs

In their paper, Szer and Charpillet introduce Multi-Agent A* (MAA*)—a heuristic search algorithm designed to compute optimal solutions for Decentralized Partially Observable Markov Decision Processes (DEC-POMDPs) with finite horizons. The MAA* algorithm marks an advancement in solving cooperative, multi-agent planning problems characteristic of domains where agents operate within a stochastic environment and have limited, partial information about the system state. As DEC-POMDPs are NEXP-complete, MAA* targets the computational difficulties of providing exact solutions through a combination of classical heuristic search strategies and decentralized control paradigms.

Methodology and Algorithm Details

MAA* stands as a pioneering attempt at complete and optimal heuristic search for DEC-POMDPs by adapting the classical A* algorithm to this multi-agent setting. The complexity in DEC-POMDPs lies in the coordination of separate agents making decisions based on differing local information, which precludes straightforward state-value assignments usually valuable in single-agent POMDPs. MAA* overcomes this by evaluating complete sets of agent policies, utilizing domain-independent heuristic functions that prune irrelevant policies to reduce computational overhead. The heuristic is computed via state-dependent upper bound estimations derived from both centralized MDPs and POMDPs.

The algorithm follows a structured A* process, evaluating nodes in a search tree corresponding to policy vectors with depth indicators that align with decision-making horizons. The explicit definition of the evaluation function is based on an overestimating heuristic that significantly aids in managing the exponential growth typical of search trees in DEC-POMDP solutions.

Experimental Results and Analysis

Experimental validations were conducted on benchmark problems, including different variations of the multi-agent tiger problem and a multi-access broadcast channel problem. Comparing the MAA* performance with a dynamic programming approach demonstrated MAA*'s advantages, particularly in lower memory requirements and adaptability to larger problems, making it a potentially more viable solution for real-world applications. The MAA* implementation evaluated significantly fewer policy pairs than a brute-force methodology would require. This efficiency is further emphasized with the usage of an MDP or recursive approach as the heuristic, as results showed extensive pruning facilitated by the latter.

Implications and Future Prospects

MAA* contributes significantly to the control strategies in cooperative systems by accommodating the challenges inherent to DEC-POMDPs, such as state-action exploration under uncertainty and multiparty policy alignment. Its application may extend beyond the current experimental problems to scenarios in network traffic management, robotic task coordination, and distributed resource allocation—a reflection of its flexible and domain-transferable methodology.

One theoretical implication is the validation of heuristic-based optimization in decentralization, creating a groundwork for exploring infinite-horizon DEC-POMDPs potentially with stochastic policies. It calls for further research on extending finite state controllers and considering broader applications in other decentralized contexts.

In summary, while MAA* has shown proficiency in managing finite-horizon situations within the DEC-POMDP framework, its potential developments could further expand practical coordinative techniques in multi-agent systems, enhancing decision-making efficacy across diverse and complex domains.

PDF Markdown