A Probabilistic Framework for Hierarchical Goal Recognition

Published 24 Apr 2026 in cs.SC and cs.AI | (2604.22256v1)

Abstract: Goal recognition aims to infer an agent's goal from observations of its behaviour. In realistic settings, recognition can benefit from exploiting hierarchical task structure and reasoning under uncertainty. Planning-based goal recognition has made substantial progress over the past decade, but to the best of our knowledge no existing approach jointly integrates hierarchical task structure with probabilistic inference. In this paper, we introduce the first planning-based probabilistic framework for hierarchical goal recognition over Hierarchical Task Networks (HTNs). We instantiate the framework by exploiting an HTN planner with a three-stage generative model for likelihood estimation, yielding posterior distributions over goal hypotheses. Empirical results show improved recognition performance over the existing HTN-based recognizer on HTN benchmarks. Overall, the framework lays a foundation for probabilistic goal recognition grounded in hierarchical planning structure, moving goal recognition toward more practical settings.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces a novel probabilistic HTN framework that employs Bayesian inference to rank goal hypotheses based on observed actions.
It outlines a three-stage generative process that integrates hierarchical decomposition, executable plan sampling, and observation generation to handle noise and exogenous actions.
Empirical evaluations on benchmark domains demonstrate significant early recognition accuracy improvements and robustness compared to deterministic baselines.

A Probabilistic Framework for Hierarchical Goal Recognition

Introduction and Motivation

The paper "A Probabilistic Framework for Hierarchical Goal Recognition" (2604.22256) addresses the challenge of inferring an agent's high-level intention from partial and potentially noisy observations, leveraging structured knowledge about hierarchical task composition. Unlike previous approaches that either consider only flat action models in Bayesian inference or are limited to deterministic acceptance in hierarchical settings, this work presents the first framework that formulates hierarchical goal recognition over Hierarchical Task Networks (HTNs) as a principled Bayesian inference problem. The integration of probabilistic reasoning with HTN decomposition yields a framework capable of ranking competing goal hypotheses, handling arbitrary observation noise, and tolerating exogenous, goal-irrelevant actions, thereby enhancing both expressivity and robustness.

HTN-Based Goal Recognition and Its Limitations

Classical and HTN planning-based goal recognition has matured, featuring sophisticated reduction techniques from observation matching to planning as exemplified in Holler et al. (2018). The standard pipeline: for each candidate goal, construct a planning instance and test for observation-constrained feasibility. However, this feasibility-based approach is limited: it cannot compare hypotheses or maintain belief under observation noise. The existing HTN-based recognizer is strictly deterministic—it returns binary accept/reject outcomes and fails outright in the presence of exogenous actions, e.g., when observed sequences contain noise or user errors not covered by the HTN model. Additionally, it cannot express preference for "less surprising" explanations among equally consistent hypotheses due to the lack of a graded scoring mechanism.

Bayesian Inference via a Generative HTN Model

The core advancement of the paper is the probabilistic formalization termed Probabilistic Hierarchical Goal Recognition (PHGR). Given an HTN domain, a set of candidate goals (task networks), and a prior, Bayesian inference is performed to update posterior probabilities over goal hypotheses conditioned on a sequence of observed primitive actions. The likelihood $P(\hat{o}|N^g, s_0)$ is central; its estimation is grounded in a generative three-stage process:

Hierarchical Decomposition: Sample a sequence of method applications reducing a goal network $N^g$ to a primitive action network $N$ . Choices are performed stochastically via a Boltzmann distribution over method costs, implementing a soft rationality assumption.
Executable Plan Sampling: From $N$ , generate an executable linearization $\pi$ consistent with task dependencies and initial state, sampling uniformly over applicable actions.
Observation Generation: Model observation of a (possibly partial and noisy) prefix/subsequence from $\pi$ , including support for unaligned exogenous actions via task insertion semantics.

This process operationalizes the intuition that inference should prefer hypotheses yielding observation-aligned executions that are not substantially less probable (with respect to the generative model) than unconstrained alternative plans. Posterior belief is thus sensitive to how "surprising" given observations are for each hypothesis, penalizing the necessity of off-optimal behaviors.

Figure 1: Top-3 accuracy as a function of observation ratio, demonstrating the improved early-recognition performance of the three-stage probabilistic estimator over the HTN goal recognition baseline.

Likelihood Approximation and Top-k Hypotheses Selection

Exact marginalization over all possible execution-refinement-observation alignments is intractable, so the paper adopts a pragmatic max-product approximation: for each candidate goal, the ratio of the most probable explanation for the given observation to the most probable unconstrained execution (as per the generative model) is used as the likelihood estimate. This approach subsumes the plan cost-based heuristics used in prior flat planning goal recognition work but supports the richer HTN hierarchy and uncertainty modeling.

To handle the scalability bottleneck of evaluating all goals, the paper formalizes a top- $k$ hypothesis selection protocol. Candidate hypotheses whose observationally constrained plans are among the $k$ cheapest are selected (using plan cost as a proxy for the generative-model likelihood), and full posterior inference is performed within this set. While not globally optimal due to planner incompleteness and possible divergence between cost and likelihood, this approach is computationally viable and empirically effective.

Handling Exogenous Actions: Task Insertion Semantics

A key technical innovation is the integration of exogenous action handling via task insertion. Exogenous actions—steps present in the observations but not licensed by HTN decomposition—are modeled in the observation generation step by permitting insertions into the plan's executable sequence. Theoretical results are given: when using a complete task-insertion-enabled HTN planner, the model provides nonzero posterior support for any hypothesis capable of explaining the observations up to exogenous insertions. Furthermore, it is proven that each additional exogenous action monotonically decreases the likelihood for a hypothesis, matching intuitive expectations.

When only planners without task insertion support are available, recognition is conservative: only hypotheses whose unconstrained and observationally constrained explanations do not require exogenous actions can be considered.

Empirical Evaluation

The framework is benchmarked on the Kitchen and Monroe domains, covering multi-course meal preparation and a standard plan-recognition testbed, respectively. Contrary to flat plan recognition settings, these domains feature hundreds to thousands of overlapping hierarchical goal hypotheses, long prefixes, and extensive partial ordering. Metrics are top- $k$ accuracy (1, 3, and 5) reflecting practical recognition utility.

Numerical results indicate that the proposed framework achieves significant improvements, especially in early prefix settings (e.g., after observing only 20–40% of the behavior):

In the Kitchen domain, top-3 accuracy at 40% observation improves from 32.6% (baseline) to 71.8% for the three-stage estimator. Performance converges at higher observation ratios.
Robust improvements are also observed under partial observability (randomly removed actions), underscoring tolerance to observation noise and suboptimality.
On Monroe, the approach attains 65% top-3 accuracy at 10% observation (versus 30% baseline), reaching full accuracy soon after, consistent with goal divergence.

Critically, when exogenous actions are introduced (e.g., an irrelevant "add milk" action), the deterministic baseline either misdetects the intention (by inferring unmotivated tasks) or fails outright; the probabilistic framework, by contrast, maintains a nonzero posterior over the ground-truth hypothesis.

Theoretical and Practical Implications

The proposed framework advances the state of the art by providing the first unified, planning-based method that:

Integrates hierarchical structure in a generative and probabilistic manner, supporting nuanced hypothesis ranking and robust inference.
Handles exogenous and noisy observations without exhaustive pre-compilation or plan-library enumeration.
Maintains theoretical guarantees on posterior support, likelihood monotonicity, and rational sensitivity to "explanation surprise".

Practically, it opens the door to applying hierarchical goal recognition in real-world settings where agent behavior is only partially observed, noisy, or even adversarial.

Theoretically, the framework connects lines of work in plan recognition as planning, probabilistic plan-library methods, and cognitive models of human goal inference, providing a modular template for future extension.

Future Perspectives

Three promising avenues for future inquiry are articulated: (i) the development of HTN planners that explicitly optimize generative-model likelihood rather than plan cost; (ii) scalable, general-purpose support for task insertion to enable robust handling of observation noise; and (iii) further validation and modeling inspired by empirical studies of human hierarchical intention inference.

Conclusion

This work establishes the first framework for planning-based probabilistic goal recognition on HTNs, fusing hierarchical decomposition with Bayesian inference and observation noise robustness. Empirical and theoretical analyses demonstrate improved performance, expressivity, and resilience compared to the deterministic baseline, marking a significant advance in the practical and methodological toolkit for intention recognition in structured domains.

Markdown Report Issue