LLM-Guided Probabilistic Program Induction for POMDP Model Estimation (2505.02216v2)

Published 4 May 2025 in cs.AI

Abstract: Partially Observable Markov Decision Processes (POMDPs) model decision making under uncertainty. While there are many approaches to approximately solving POMDPs, we aim to address the problem of learning such models. In particular, we are interested in a subclass of POMDPs wherein the components of the model, including the observation function, reward function, transition function, and initial state distribution function, can be modeled as low-complexity probabilistic graphical models in the form of a short probabilistic program. Our strategy to learn these programs uses an LLM as a prior, generating candidate probabilistic programs that are then tested against the empirical distribution and adjusted through feedback. We experiment on a number of classical toy POMDP problems, simulated MiniGrid domains, and two real mobile-base robotics search domains involving partial observability. Our results show that using an LLM to guide in the construction of a low-complexity POMDP model can be more effective than tabular POMDP learning, behavior cloning, or direct LLM planning.

Summary

The paper proposes a novel framework leveraging Large Language Models (LLMs) to guide probabilistic program induction for estimating interpretable Partially Observable Markov Decision Process (POMDP) models directly from data.
Experiments across simulated and real-world robotics domains demonstrate superior performance and enhanced sample efficiency compared to conventional tabular methods and direct LLM execution baselines.
This research highlights the potential of integrating LLM code generation capabilities with probabilistic programming to discover complex world models, addressing challenges in AI-driven decision-making under partial observability.

LLM-Guided Probabilistic Program Induction for POMDP Model Estimation

The paper "LLM-Guided Probabilistic Program Induction for POMDP Model Estimation" presents an innovative approach to model-based reinforcement learning in domains characterized by uncertainty and partial observability, structured as Partially Observable Markov Decision Processes (POMDPs). This research addresses the significant challenge of learning interpretable, low-complexity POMDP models directly from data, leveraging LLMs as a novel facilitation tool.

Approach

The authors propose a framework that integrates probabilistic programming with LLM-guided program induction to model components of POMDPs. These components include the observation function, reward function, transition dynamics, and initial state distribution—all encapsulated as probabilistic graphical models within short probabilistic programs. The primary strategy involves using LLMs to generate candidate models that are tested against empirical data and refined iteratively.

This methodology importantly contrasts with conventional approaches such as direct planning with LLMs, behavior cloning, or tabular POMDP learning, positing that the symbolic representation and abstraction afforded by probabilistic programs offer greater efficiency, scalability, and accuracy.

Experiments and Results

The empirical evaluation comprises simulated experiments across classical POMDP problems and MiniGrid tasks, as well as real-world robotics domains involving a Boston Dynamics Spot robot. The results demonstrate that the proposed approach yields superior performance compared to several baseline methods, underscored by higher expected discounted rewards and enhanced sample efficiency.

Specifically, POMDP Coder—a component of their framework—outperformed both traditional tabular reinforcement learning methods and direct LLM execution strategies, which frequently encountered issues with infinite loops or inadequate coverage of state spaces. Additionally, the research emphasizes the advantage of symbolic prescriptions over behavior cloning, particularly in scenarios requiring generalization beyond training examples.

Implications and Future Directions

The implications of this work are threefold. Firstly, it showcases how LLMs can be harnessed not merely for generating sequences but to guide the discovery and refinement of complex models in probabilistic programming. Secondly, it establishes a promising avenue for addressing the bottlenecks associated with the high dimensionality of belief spaces inherent in POMDPs. Thirdly, the research suggests the feasibility of integrating code-generation capabilities of LLMs with traditional probabilistic techniques to explore world modeling and decision-making tasks in partially observable settings.

Future research may explore extending this framework to continuous state and action spaces, which were not covered by the current implementation. Additionally, enhancing the scalability of belief updating mechanisms—such as implementing factored particle filters—could further amplify the approach's capacity to handle larger and more complex domains.