Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 44 tok/s Pro
GPT-5 Medium 20 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 100 tok/s Pro
Kimi K2 177 tok/s Pro
GPT OSS 120B 434 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Hierarchical LLM-Based Planning Architectures

Updated 25 October 2025
  • Hierarchical LLM-based planning architectures are integrated frameworks that translate natural language tasks into multi-level representations using semantic scene graphs and formal heuristics.
  • They leverage LLMs to convert natural language into co-safe LTL specifications and provide semantic guidance to bias planning in complex environments.
  • Experimental validations show faster computations, reduced node expansions, and closer optimality gaps in real-world robotic navigation tasks.

A hierarchical LLM-based planning architecture is an integrated computational framework in which tasks specified in natural or formal language are decomposed into multi-level representations—typically spanning from abstract intentions to fine-grained executable actions—through the coordinated use of scene or knowledge graphs, automata, and semantic abstraction. LLMs serve crucial roles in both translating instructions to formal task specifications and providing high-level semantic guidance, while the overall planning loop incorporates rigorous guarantees via formal heuristics, multi-resolution domain decomposition, and semantic attribute grounding as seen in "Optimal Scene Graph Planning with LLM Guidance" (Dai et al., 2023).

1. Environment Representation and Semantic Abstraction

Hierarchical LLM-based planning operates on a semantic scene representation structured as a scene graph: G=(V,E,{Ak}k=1K)G = (V, E, \{A_k\}_{k=1}^K) where VV is the set of nodes (each representing, e.g., spatial regions or free space), EV×VE \subseteq V \times V captures connectivity, and {Ak}k=1K\{A_k\}_{k=1}^K represents hierarchical attribute sets, each AkA_k defining semantic classes such as objects, rooms, and floors. Each attribute aAka \in A_k is associated with a subset VaVV_a \subset V. For every node sVs \in V, atomic propositions encode semantic properties, and a labeling function :V2P\ell: V \rightarrow 2^\mathcal{P} maps nodes to the propositions they fulfill.

This explicit structural hierarchy is further reduced for LLM consumption: the full graph is converted to an "attribute hierarchy" in a nested YAML format, exposing only high-level semantic entities and their inclusion relationships (e.g., rooms within floors, objects within rooms), each tagged with unique IDs to facilitate unambiguous reasoning and reduce prompt overload.

2. LLM-Guided Task Formalization and Semantic Heuristics

LLMs fulfill two distinct, critical functions:

A. Natural Language to Formal Specification

LLMs receive the attribute hierarchy and the user's mission (e.g., "Reach the oven in the kitchen") and, through prompt engineering (with regex-based extraction and translation examples in prefix LTL notation), transduce it to a co-safe Linear Temporal Logic (LTL) formula φμ\varphi_\mu. The formula undergoes syntactic and co-safety (negation in normal form, only \bigcirc and U\mathcal{U} operators) verification and is then determinized to a finite automaton MφM_{\varphi} reflecting task progress.

B. Semantic Heuristic Guidance

During search, the LLM is further prompted with the current automaton state, scene context, and motion function signatures; it emits high-level semantic guidance—function call sequences (e.g., move(ai,aj)\text{move}(a_i, a_j))—with associated user-defined costs. These sequences instantiate a heuristic hLLMh_{\mathrm{LLM}}, injected into the planner to bias towards transitions deemed semantically promising beyond what metric/geometric information alone could reveal.

3. Hierarchical Planning Domain Construction

The planning state space is structured across multiple resolution levels. For each attribute layer kk, the planner defines: Xk=Vk×Q,where Vk=aAkVaX_k = V_k \times Q, \quad \text{where}\ V_k = \bigcup_{a \in A_k} V_a and QQ is the automaton state set. Transitions (xi,xj)(x_i, x_j) are permitted if:

  1. sjs_j crosses from a region VaV_a to the boundary of another VbV_b (aba \neq b, non-overlapping interiors),
  2. The automaton transition is respected: qj=T(qi,(si))q_j = T(q_i, \ell(s_i)),
  3. sjs_j is minimized according to d(si,s)d(s_i, s), the edge/path cost.

The lowest-level "anchor" domain X0=V×QX_0=V \times Q allows unification of all abstraction layers and connects via original scene graph transitions.

4. Multi-Heuristic Any-Time Search and Theoretical Guarantees

Planning exploits AMRA*: Anytime Multi-resolution Multi-heuristic A*. Two heuristics interact:

  • LTL Heuristic hLTLh_{\mathrm{LTL}}:

This function is provably consistent (and thus ensures optimality). It computes for a planning state (s,q)(s, q): hLTL(s,q)=mintV[c(s,t)+g((t),T(q,(t)))]h_{\mathrm{LTL}}(s, q) = \min_{t \in V} [ c(s, t) + g(\ell(t), T(q, \ell(t))) ] where g(,q)g(\ell,q) is defined recursively via a Bellman/Dijkstra optimization over labels and automaton states, and c(s,t)c(s, t) is the cost between nodes.

  • LLM Semantic Heuristic hLLMh_{\mathrm{LLM}}:

The sequence of moves from the LLM, fi(aj,ak)f_i(a_j, a_k), each weighted by their cost, is aggregated: hLLM(s,q)=i=0Nfi(aj,ak)h_{\mathrm{LLM}}(s, q) = \sum_{i=0}^N f_i(a_j, a_k) Although potentially inadmissible, hLLMh_{\mathrm{LLM}} accelerates search, while hLTLh_{\mathrm{LTL}} maintains optimality guarantees.

5. Experimental Validation and Performance Analysis

The system is evaluated in real-world inspired virtual environments (e.g., Allensville, Benevolence, Collierville) with complex missions—e.g., "Visit bathroom and avoid the sink, then go to the dining room and sit on a chair." Configuration variants include baseline A*, full hierarchy with and without LLM guidance, and selective application of the LLM heuristic to certain levels.

Key results:

  • Computation Time: The "ALL" LLM-heuristic configuration yields the fastest feasible path computation and fastest convergence to optimality.
  • Optimality Gap: The cost of first-found path is closer to the optimal versus baselines.
  • Node Expansions: Substantial reduction in nodes expanded per iteration when using LLM guidance; search efficiency is directly improved by hierarchical semantic heuristics.

These metrics demonstrate that integrating semantic LLM heuristics with hierarchical, multi-resolution planning domains yields marked improvements in both speed and optimality of complex natural language task execution.

6. Applications and System Implications

The described framework enables:

  • Autonomous Mobile Robots: Robust execution of complex, compositional tasks specified in natural language, leveraging high-level scene abstractions for navigation and manipulation in semantically rich environments.
  • Symbol Grounding: LLMs serve as a crucial link between linguistic commands and their corresponding propositional representations over the scene graph, enabling efficient symbol grounding in previously ambiguous or “free-form” settings.
  • Scalability: Multi-resolution abstraction dramatically reduces search complexity in large-scale, multi-layer environments; LLM heuristics scale with semantic richness rather than just raw graph cardinality.
  • Extension to Dynamic/Real Scenarios: The planner’s modular construct allows systematic extension, integration with live perceptual updates, and adaptation for safety-critical operations where formal optimality and correctness are paramount.

7. Summary and Theoretical Significance

This hierarchical LLM-based planning architecture exemplifies a synergistic integration of deep semantic scene abstractions, formal task logic, and both rigorous (LTL) and informed (LLM) heuristic guidance. The method achieves:

  • Explicit representation of environments as hierarchical scene graphs with atomic propositions.
  • Formal natural language task translation to co-safe LTL via LLMs.
  • Hierarchical domain factorization reflecting geometric, semantic, and automaton structure.
  • Dual-heuristic anytime planning, marrying search efficiency with guaranteed optimality.
  • Empirically demonstrated gains in speed and plan quality.

As a result, it provides a technically grounded paradigm for deploying natural language-driven robotic planning in environments of significant scale and structural complexity, while maintaining guarantees required for real-world autonomy (Dai et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Hierarchical LLM-Based Planning Architectures.