Papers
Topics
Authors
Recent
2000 character limit reached

Hierarchical LLM-Based Planning Architectures

Updated 25 October 2025
  • Hierarchical LLM-based planning architectures are integrated frameworks that translate natural language tasks into multi-level representations using semantic scene graphs and formal heuristics.
  • They leverage LLMs to convert natural language into co-safe LTL specifications and provide semantic guidance to bias planning in complex environments.
  • Experimental validations show faster computations, reduced node expansions, and closer optimality gaps in real-world robotic navigation tasks.

A hierarchical LLM-based planning architecture is an integrated computational framework in which tasks specified in natural or formal language are decomposed into multi-level representations—typically spanning from abstract intentions to fine-grained executable actions—through the coordinated use of scene or knowledge graphs, automata, and semantic abstraction. LLMs serve crucial roles in both translating instructions to formal task specifications and providing high-level semantic guidance, while the overall planning loop incorporates rigorous guarantees via formal heuristics, multi-resolution domain decomposition, and semantic attribute grounding as seen in "Optimal Scene Graph Planning with LLM Guidance" (Dai et al., 2023).

1. Environment Representation and Semantic Abstraction

Hierarchical LLM-based planning operates on a semantic scene representation structured as a scene graph: G=(V,E,{Ak}k=1K)G = (V, E, \{A_k\}_{k=1}^K) where VV is the set of nodes (each representing, e.g., spatial regions or free space), EV×VE \subseteq V \times V captures connectivity, and {Ak}k=1K\{A_k\}_{k=1}^K represents hierarchical attribute sets, each AkA_k defining semantic classes such as objects, rooms, and floors. Each attribute aAka \in A_k is associated with a subset VaVV_a \subset V. For every node sVs \in V, atomic propositions encode semantic properties, and a labeling function :V2P\ell: V \rightarrow 2^\mathcal{P} maps nodes to the propositions they fulfill.

This explicit structural hierarchy is further reduced for LLM consumption: the full graph is converted to an "attribute hierarchy" in a nested YAML format, exposing only high-level semantic entities and their inclusion relationships (e.g., rooms within floors, objects within rooms), each tagged with unique IDs to facilitate unambiguous reasoning and reduce prompt overload.

2. LLM-Guided Task Formalization and Semantic Heuristics

LLMs fulfill two distinct, critical functions:

A. Natural Language to Formal Specification

LLMs receive the attribute hierarchy and the user's mission (e.g., "Reach the oven in the kitchen") and, through prompt engineering (with regex-based extraction and translation examples in prefix LTL notation), transduce it to a co-safe Linear Temporal Logic (LTL) formula φμ\varphi_\mu. The formula undergoes syntactic and co-safety (negation in normal form, only \bigcirc and U\mathcal{U} operators) verification and is then determinized to a finite automaton MφM_{\varphi} reflecting task progress.

B. Semantic Heuristic Guidance

During search, the LLM is further prompted with the current automaton state, scene context, and motion function signatures; it emits high-level semantic guidance—function call sequences (e.g., move(ai,aj)\text{move}(a_i, a_j))—with associated user-defined costs. These sequences instantiate a heuristic hLLMh_{\mathrm{LLM}}, injected into the planner to bias towards transitions deemed semantically promising beyond what metric/geometric information alone could reveal.

3. Hierarchical Planning Domain Construction

The planning state space is structured across multiple resolution levels. For each attribute layer kk, the planner defines: Xk=Vk×Q,where Vk=aAkVaX_k = V_k \times Q, \quad \text{where}\ V_k = \bigcup_{a \in A_k} V_a and QQ is the automaton state set. Transitions (xi,xj)(x_i, x_j) are permitted if:

  1. sjs_j crosses from a region VaV_a to the boundary of another VbV_b (aba \neq b, non-overlapping interiors),
  2. The automaton transition is respected: qj=T(qi,(si))q_j = T(q_i, \ell(s_i)),
  3. sjs_j is minimized according to d(si,s)d(s_i, s), the edge/path cost.

The lowest-level "anchor" domain X0=V×QX_0=V \times Q allows unification of all abstraction layers and connects via original scene graph transitions.

4. Multi-Heuristic Any-Time Search and Theoretical Guarantees

Planning exploits AMRA*: Anytime Multi-resolution Multi-heuristic A*. Two heuristics interact:

  • LTL Heuristic hLTLh_{\mathrm{LTL}}:

This function is provably consistent (and thus ensures optimality). It computes for a planning state (s,q)(s, q): hLTL(s,q)=mintV[c(s,t)+g((t),T(q,(t)))]h_{\mathrm{LTL}}(s, q) = \min_{t \in V} [ c(s, t) + g(\ell(t), T(q, \ell(t))) ] where g(,q)g(\ell,q) is defined recursively via a Bellman/Dijkstra optimization over labels and automaton states, and c(s,t)c(s, t) is the cost between nodes.

  • LLM Semantic Heuristic hLLMh_{\mathrm{LLM}}:

The sequence of moves from the LLM, fi(aj,ak)f_i(a_j, a_k), each weighted by their cost, is aggregated: hLLM(s,q)=i=0Nfi(aj,ak)h_{\mathrm{LLM}}(s, q) = \sum_{i=0}^N f_i(a_j, a_k) Although potentially inadmissible, hLLMh_{\mathrm{LLM}} accelerates search, while hLTLh_{\mathrm{LTL}} maintains optimality guarantees.

5. Experimental Validation and Performance Analysis

The system is evaluated in real-world inspired virtual environments (e.g., Allensville, Benevolence, Collierville) with complex missions—e.g., "Visit bathroom and avoid the sink, then go to the dining room and sit on a chair." Configuration variants include baseline A*, full hierarchy with and without LLM guidance, and selective application of the LLM heuristic to certain levels.

Key results:

  • Computation Time: The "ALL" LLM-heuristic configuration yields the fastest feasible path computation and fastest convergence to optimality.
  • Optimality Gap: The cost of first-found path is closer to the optimal versus baselines.
  • Node Expansions: Substantial reduction in nodes expanded per iteration when using LLM guidance; search efficiency is directly improved by hierarchical semantic heuristics.

These metrics demonstrate that integrating semantic LLM heuristics with hierarchical, multi-resolution planning domains yields marked improvements in both speed and optimality of complex natural language task execution.

6. Applications and System Implications

The described framework enables:

  • Autonomous Mobile Robots: Robust execution of complex, compositional tasks specified in natural language, leveraging high-level scene abstractions for navigation and manipulation in semantically rich environments.
  • Symbol Grounding: LLMs serve as a crucial link between linguistic commands and their corresponding propositional representations over the scene graph, enabling efficient symbol grounding in previously ambiguous or “free-form” settings.
  • Scalability: Multi-resolution abstraction dramatically reduces search complexity in large-scale, multi-layer environments; LLM heuristics scale with semantic richness rather than just raw graph cardinality.
  • Extension to Dynamic/Real Scenarios: The planner’s modular construct allows systematic extension, integration with live perceptual updates, and adaptation for safety-critical operations where formal optimality and correctness are paramount.

7. Summary and Theoretical Significance

This hierarchical LLM-based planning architecture exemplifies a synergistic integration of deep semantic scene abstractions, formal task logic, and both rigorous (LTL) and informed (LLM) heuristic guidance. The method achieves:

  • Explicit representation of environments as hierarchical scene graphs with atomic propositions.
  • Formal natural language task translation to co-safe LTL via LLMs.
  • Hierarchical domain factorization reflecting geometric, semantic, and automaton structure.
  • Dual-heuristic anytime planning, marrying search efficiency with guaranteed optimality.
  • Empirically demonstrated gains in speed and plan quality.

As a result, it provides a technically grounded paradigm for deploying natural language-driven robotic planning in environments of significant scale and structural complexity, while maintaining guarantees required for real-world autonomy (Dai et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Hierarchical LLM-Based Planning Architectures.