Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 102 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 43 tok/s
GPT-5 High 49 tok/s Pro
GPT-4o 108 tok/s
GPT OSS 120B 468 tok/s Pro
Kimi K2 243 tok/s Pro
2000 character limit reached

LLM-Planner: Efficient Hierarchical Planning

Updated 22 August 2025
  • LLM-Planner is a hierarchical planning framework that uses pre-trained LLMs to decompose natural language goals into actionable subgoals with minimal demonstrations.
  • It employs dynamic, closed-loop re-planning by integrating perception-driven feedback, allowing the system to adapt to unexpected changes in complex environments.
  • Empirical results on benchmarks like ALFRED demonstrate its sample efficiency and competitive performance compared to fully supervised approaches.

LLM–based Planners, frequently abbreviated here as “LLM-Planner,” constitute a set of algorithms and systems exploiting the few-shot reasoning, abstraction, and linguistic grounding abilities of contemporary LLMs to deliver high-level, sample-efficient planning for embodied agents interacting with complex, partially observable environments. Unlike traditional end-to-end supervised methods, which require extensive datasets of labeled instruction–trajectory pairs, LLM-Planner architectures leverage pre-trained LLMs to decompose natural language goals into actionable subgoals using only a handful of demonstrations, and further adapt these plans to the current sensory context via dynamic, perception-driven re-planning.

1. Motivation and Problem Setting

LLM-Planner is designed to address several key limitations in existing robotic and embodied planning frameworks:

  • High data cost and sample inefficiency in end-to-end supervised imitation learning pipelines.
  • Lack of flexible planning: conventional methods typically generate a one-off action sequence at initialization, risking catastrophic failure when faced with unexpected environmental changes or out-of-distribution states.
  • Absence of environmental grounding: LLMs generate plausible subgoals based on commonsense reasoning, but are not natively aware of which objects or constraints are valid in the current scene.
  • Need for a planning component capable of operating solely from language instructions, without requiring enumeration of the entire skill set or explicit symbolic modeling of the environment.

LLM-Planner, as introduced in (Song et al., 2022), leverages in-context, few-shot prompting of LLMs such as GPT-3 to guide embodied agents through long-horizon, natural language tasks with minimal paired data—often fewer than 0.5% of the full training corpus—achieving competitive or superior performance relative to fully supervised baselines.

2. Hierarchical Planning and Physical Grounding

The LLM-Planner architecture employs a two-level hierarchy:

  • High-level planner (LLM): Given a natural language instruction and a list of observed objects, the LLM generates a sequence of high-level subgoals, typically as (action, object) tuples (e.g., (navigate, fridge), (open, drawer)). These are produced by prompt engineering: carefully designed prompts inject task explanations, current scene inventory, and in-context demonstration pairs retrieved through k-nearest neighbor (kNN) selection using BERT embeddings.
  • Low-level planner: Each subgoal from the LLM is mapped to a trajectory of primitive environment actions, as implemented by a downstream controller or policy.

A key innovation is dynamic grounding via closed-loop re-planning. If the agent encounters a bottleneck (e.g., failing to find a required object or exceeding step budgets for a subgoal), the LLM is re-invoked with updated environmental observations and the list of already completed subgoals, resulting in a re-generated plan that better reflects the true, partially observed state. The process iteratively alternates between high-level plan generation and execution, with physical feedback triggering plan adaptation. This closed-loop procedure is formalized in Algorithm 1:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Algorithm: Dynamic Grounded Re-Planning
Input: Instruction I; Observed objects O; Completed subgoals G

S ← LLM-Planner(I, O, G)  // Generate high-level plan
t = 0; k = 0; s = S[k]
aₜ ← LowLevelPlanner(s)
while k < len(S):
    execute aₜ
    update O from perception
    if subgoal s fails or step threshold exceeded:
        S ← LLM-Planner(I, O, G); reset k, s
    elif s is completed:
        k ← k + 1; s ← S[k]
    t ← t + 1
    aₜ ← LowLevelPlanner(s)

3. Prompt Engineering and Example Selection

The success of LLM-Planner is contingent on effective prompt design and intelligent demonstration selection. The prompt incorporates:

  • Task explanation and a vocabulary of admissible high-level actions.
  • The instruction, which may be either the original goal statement or a decomposed, step-wise statement.
  • A dynamically injected list of objects observed in the current environment.
  • Several in-context examples (instruction → plan pairs) retrieved by a BERT-based k-nearest neighbor over the instruction embedding space, ensuring contextual relevance.
  • Logit biasing at inference to prefer tokens representing objects known to be present, further mitigating hallucination.

Careful composition of these inputs is critical for transferring the in-context learning ability of the LLM to grounded planning tasks with minimal overfitting or reliance on extensive data.

4. Empirical Evaluation and Performance

LLM-Planner was evaluated primarily on the ALFRED benchmark, which requires vision-language navigation and manipulation in AI2-THOR—a simulated domestic environment with 207 rooms, 115 object types, and long-horizon tasks averaging 50 actions. The experimental protocol enforced a stringent few-shot regime: only 100 paired (instruction, high-level plan) examples (<0.5% of 21,023 total). Results are summarized as follows:

Setting Success Rate (SR) Calls/Task Full-Data Baseline
Few-shot LLM-Planner Competitive ~7 Trained on all data
Few-shot Baseline (HLSM) <1% (barely any) Trained on 100 exs
SayCan (Oracle) Lower, slower ~22 Oracle environment

Dynamic grounded re-planning yielded a ~1.8% SR improvement over static variants on unseen test environments. Moreover, ablations confirmed that both kNN-based example selection and logit biasing are essential for high-level plan accuracy.

5. Theoretical and Algorithmic Underpinnings

LLM-Planner can be viewed as a particular instantiation of hierarchical, perception-informed planning, where for an instruction II and environment EE:

  • The LLM high-level planner produces Lh=[g0,...,gT]L_h = [g_0, ..., g_T], where gig_i are subgoals in (action, object) pairs.
  • The low-level planner produces L=[a0,...]L_\ell = [a_0, ...] such that P(LI,Lh,E)=P(LLh,E)P(L_\ell | I, L_h, E) = P(L_\ell | L_h, E).

The use of dynamic re-planning effectively aligns the sequence [g0,...,gT][g_0, ..., g_T] with ever-changing perceptions, forming a feedback loop between the agent and its environment—a contrast to static open-loop plans that fail under uncertainty.

6. Sample Efficiency, Robustness, and Future Research

The LLM-Planner paradigm demonstrates that leveraging pre-trained LLMs with a thoughtful in-context and grounding strategy enables the construction of embodied agents with dramatic improvements in sample efficiency. Full-data baselines require orders of magnitude more labeled pairs for similar performance, and most alternative methods are unable to complete tasks with only a few demonstrations.

Current limitations include the potential for LLM hallucination in highly out-of-distribution environments and sensitivity to prompt design. Future research priorities include:

  • Exploring larger, code-capable LLM backends (e.g., Codex), more sophisticated prompt engineering, and advanced in-context selection algorithms.
  • Tightening the perception–planner integration, potentially by unifying object detection and symbolic scene understanding in the feedback loop.
  • Scaling the methodology to real-world sensor data and deploying to real robotic platforms.
  • Extending to open-set skill learning without reliance on a pre-specified action vocabulary or high-level action types.

7. Implications and Significance

LLM-Planner marks a transition from passive, static plan dispatching to actively grounded, closed-loop planning in embodied AI. By reducing data annotation constraints and tying planning directly to current sensory feedback, this architecture opens the possibility for more versatile, rapidly adaptable autonomous agents able to quickly learn and execute complex user instructions in changing environments using only modest demonstration resources. As such, it is likely to influence both the design of future language-based planning systems and the development of robust, efficient embodied agents for real-world deployment (Song et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)