DESPOT: Online POMDP Planning with Regularization (1609.03250v3)

Published 12 Sep 2016 in cs.AI

Abstract: The partially observable Markov decision process (POMDP) provides a principled general framework for planning under uncertainty, but solving POMDPs optimally is computationally intractable, due to the "curse of dimensionality" and the "curse of history". To overcome these challenges, we introduce the Determinized Sparse Partially Observable Tree (DESPOT), a sparse approximation of the standard belief tree, for online planning under uncertainty. A DESPOT focuses online planning on a set of randomly sampled scenarios and compactly captures the "execution" of all policies under these scenarios. We show that the best policy obtained from a DESPOT is near-optimal, with a regret bound that depends on the representation size of the optimal policy. Leveraging this result, we give an anytime online planning algorithm, which searches a DESPOT for a policy that optimizes a regularized objective function. Regularization balances the estimated value of a policy under the sampled scenarios and the policy size, thus avoiding overfitting. The algorithm demonstrates strong experimental results, compared with some of the best online POMDP algorithms available. It has also been incorporated into an autonomous driving system for real-time vehicle control. The source code for the algorithm is available online.

Authors (4)

Nan Ye (22 papers)
Adhiraj Somani (1 paper)
David Hsu (73 papers)
Wee Sun Lee (61 papers)

Citations (493)

View on Semantic Scholar

Summary

The paper introduces the DESPOT algorithm, which uses sparse scenario sampling to approximate belief trees and reduce computational complexity.
The paper demonstrates near-optimal policy performance by incorporating regularization to balance value estimation with policy size, mitigating overfitting.
The algorithm’s anytime design enables real-time POMDP planning, proving its utility in applications like autonomous driving and robotics.

Overview of DESPOT: Online POMDP Planning with Regularization

The paper presents the Determinized Sparse Partially Observable Tree (DESPOT) algorithm, a novel approach for online planning under uncertainty using Partially Observable Markov Decision Processes (POMDPs). This method addresses the traditional computational challenges associated with POMDPs, notably the "curse of dimensionality" and the "curse of history."

Key Contributions

Sparse Approximation: DESPOT uses a sparse representation of the belief tree by focusing on a set of randomly sampled scenarios. This reduces the size of the belief tree significantly compared to the standard approach.
Near-Optimal Policies: The algorithm proves that the policy derived from a DESPOT is near-optimal with a regret bound dependent on the representation size of the optimal policy.
Regularization Technique: Regularizing the objective function balances the policy's estimated value against the policy size, helping to avoid overfitting to the sampled scenarios.
Anytime Algorithm: The proposed algorithm iteratively searches for an optimal policy within the DESPOT framework, allowing for real-time implementations and yielding strong experimental results compared to existing online POMDP algorithms.

Theoretical Implications

DESPOT capitalizes on sampling and heuristic search to efficiently navigate the potentially vast search space associated with POMDPs. The paper presents a rigorous competitive analysis showing that DESPOT can achieve near-optimal results even with a relatively small number of scenarios, provided the POMDP admits a compact near-optimal policy.

Practical Applications

DESPOT has been successfully integrated into an autonomous driving system for real-time vehicle control and applied to other domains like robot mine detection. The ability to handle POMDPs with very large state spaces and complex dynamics highlights its practical scale and versatility.

Experimental Results

DESPOT demonstrates superior performance in several benchmark tests, effectively balancing the trade-off between computational feasibility and solution quality compared to algorithms like SARSOP, AEMS2, and POMCP. Particularly in domains with large observation spaces, DESPOT's use of regularization mitigates overfitting, enhancing its robustness over other techniques which may be prone to it.

Future Directions

The work suggests future exploration into techniques such as hierarchical observation structuring and importance sampling to further refine the algorithm's capability in environments with high-dimensional observation spaces. Additionally, leveraging learning approaches to optimize the default policies used within the DESPOT framework could provide further performance gains.

By introducing the DESPOT algorithm, the paper contributes significantly to the field of AI planning under uncertainty. It provides a practical yet theoretically grounded approach to overcoming long-standing computational obstacles in POMDPs, particularly pertinent for large-scale, real-time applications.

PDF Markdown