Partially Observable Monte-Carlo Graph Search

Updated 4 August 2025

POMCGS is an offline, sampling-based algorithm that incrementally constructs folded policy graphs for POMDPs by merging similar beliefs during simulations.
It employs an upper confidence bound strategy along with progressive widening and observation clustering to effectively manage continuous action and observation spaces.
It achieves near-real-time deployment in time- or computation-constrained domains while delivering competitive policy quality through compact finite-state controllers.

Partially Observable Monte-Carlo Graph Search (POMCGS) is an offline, sampling-based algorithm for synthesizing policies in large partially observable Markov decision processes (POMDPs). Unlike conventional online POMDP solvers that construct and search a tree at each decision epoch, POMCGS incrementally constructs a folded policy graph—specifically a finite-state controller (FSC)—that merges similar beliefs encountered along distinct simulation paths. This approach enables near-real-time deployment in domains with stringent time or computation constraints, while providing competitive policy quality on large and continuous-state POMDPs (You et al., 28 Jul 2025).

1. Core Algorithmic Principles

POMCGS operates by simulating trajectories from the initial belief $b_0$ and expanding a graph (rather than a tree), compactly representing overlapping or redundant policy branches. At each node $n$ in the policy graph, the algorithm selects an action $a$ using an upper confidence bound (UCB) rule: $a = \underset{a}{\arg\max}\left\{ Q(n, a) + c\sqrt{\frac{\log N(n)}{N(n, a)}} \right\}$ where $Q(n, a)$ denotes the current value estimate for action $a$ at node $n$ , $N(n)$ is the visit count for $n$ , and $N(n, a)$ is the visit count for the $(n, a)$ pair.

A defining mechanism of POMCGS is belief merging: upon generating a new belief estimate $b$ during execution, the algorithm searches for an existing node $n$ with

$\| b - n.b \|_1 \leq \xi$

and merges if the condition is satisfied (where $\xi$ is a configurable threshold). Otherwise, a new graph node is instantiated. This on-the-fly "folding" achieves substantial computational and memory savings while preserving solution quality.

2. Policy Graph Construction via Belief Folding

The FSC (policy graph) resulting from POMCGS is constructed by identifying and merging nodes corresponding to similar beliefs along different histories. When a simulation encounter yields a belief $b'$ following an action and observation, the algorithm uses a specialized data structure (e.g., a cover tree) and traverses for an existing node with $\| b' - n.b \|_1 \leq \xi$ . If found, the node is reused as the successor. If not, a new node is created.

This strategy maintains a compact representation of the reachable belief space and considerably limits the combinatorial growth of the policy as compared to traditional tree-based approaches. Because the FSC is pre-computed, the entire policy can be analyzed and validated offline before deployment in real time.

3. Handling Continuous Spaces: Progressive Widening and Observation Clustering

POMCGS directly addresses scalability to continuous or high-cardinality action and observation spaces via two key subroutines:

a) Action Progressive Widening (APW): For continuous or large actions sets, only a subset of actions $C(n)$ is considered at each node $n$ , and is expanded according to:

If $|C(n)| \leq k_a [N(n)]^{\alpha_a}$ and $N(n) < N^*$ , sample and add a new action from the action space.
Otherwise, select the best among existing actions using UCB.

This incremental expansion ensures focused search along promising actions while maintaining sufficient exploration.

b) Observation Clustering: For continuous observation models, POMCGS gathers rollouts of $(o, s')$ pairs and performs $K$ -Means clustering to discretize observations into $K$ clusters per action. Each cluster defines a branch in the policy graph and a subsequent belief estimate. This step is essential to managing the exponential branching induced by continuous observations.

Method	Purpose	Implementation
Action Progressive Widening	Adaptive expansion of action set in large/continuous spaces	Grow $\|C(n)\|$ with $N(n)$
Observation Clustering	Discretization of continuous outcome space	$K$ -Means on $(o, s')$ pairs

4. Empirical Evaluation and Performance

POMCGS was evaluated on POMDP benchmarks using the POMDP.jl framework, with performance compared to state-of-the-art offline (SARSOP, MCVI) and online (DESPOT, POMCPOW, AdaOPS) planners. Key results include:

On small/moderate domains (e.g., Rock Sample $(7,8)$ ; Light Dark), POMCGS achieves near-optimal policy values matching SARSOP and competitive against strong online planners.
In large or continuous settings (e.g., Rock Sample $(15,15)$ , Lidar Roomba), POMCGS is the only offline planner capable of returning feasible policies, which are competitive relative to online methods despite no recourse to real-time planning.
High-dimensional observation spaces pose challenges for belief merging and clustering, with performance degradation observed in environments such as Laser Tag (8D observation).

Policy graph construction typically uses $\sim 10^3$ simulations per FSC update and $10^5$ for final policy evaluation, terminating upon convergence of value bounds.

5. Applications and Implications

POMCGS is tailored to scenarios requiring deployment of an offline, pre-validated policy:

Embedded robotic agents with real-time or safety-critical execution needs
Autonomous vehicles or aerial robots operating under strict computation/energy budgets
Applications where analyzing and certifying a policy prior to deployment is mandated

Unlike online search-based planners, POMCGS’ offline policy synthesis is well suited to environments where execution-time planning is prohibitive, unreliable, or unsafe.

6. Limitations and Future Directions

POMCGS, while a significant advance for offline POMDP planning, has several practical limitations:

High-dimensional observations: The effectiveness of $K$ -Means clustering diminishes as observation dimensionality increases, with performance bottlenecks observed in laser-based domains.
Sensitivity to parameters: The choice of belief merging threshold $\xi$ , number of clusters $K$ , and the number of particles per belief estimate can strongly affect both solution quality and policy graph size.
Lack of formal convergence proofs: While empirical convergence is compelling, guaranteeing optimality or approximation rates for the offline construction in the presence of belief merging remains open.
Adaptive techniques: Potential research directions include adaptive selection for belief similarity metrics, grid-free clustering for observations, and using alternatives to $\ell_1$ metrics (e.g., Wasserstein distance).

7. Relationship to Prior Work and Theoretical Foundations

POMCGS is distinguished from typical online MCTS-based POMDP solvers by its policy graph "folding" and offline synthesis. It shares ancestry with sample-based FSC construction and leverages bandit UCB action selection for balancing exploration and exploitation, but departs from classical algorithms (POMCP, DESPOT) by intentionally merging nodes to curb tree growth. Its approach to continuous spaces via APW and observation clustering builds upon progressive widening and discretization techniques from recent online planners, but applies them to the policy graph setting. The compactness of the resulting FSC and pre-computation of all action/observation contingencies fundamentally shift the computational burden to the planning phase, delivering a ready-to-execute controller for complex POMDP domains.

In summary, POMCGS enables the production of compact, offline finite-state controllers for large or continuous POMDPs, addressing central challenges in scalability, execution-time efficiency, and offline policy analysis. Its methodology, experimental validation, and identified limitations chart a path forward for both robust offline planning and further technical refinements in policy graph-based POMDP solutions (You et al., 28 Jul 2025).

PDF Markdown Chat (Pro)

References (1)

Partially Observable Monte-Carlo Graph Search (2025)

Follow Topic

Get notified by email when new papers are published related to Partially Observable Monte-Carlo Graph Search (POMCGS).