Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 35 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 97 tok/s Pro
Kimi K2 176 tok/s Pro
GPT OSS 120B 432 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

LLM-DP: Dynamic Planning with LLMs

Updated 22 October 2025
  • LLM-DP is a planning architecture that leverages large language models to generate, refine, and realign action sequences based on high-level goals and real-time environmental feedback.
  • It employs hierarchical planning with few-shot in-context learning and dynamic re-planning, significantly reducing training data dependencies while ensuring robust performance.
  • Integration with perceptual modules enables responsive, grounded execution by incorporating sensor data and environmental observations for real-world task adaptation.

A LLM Dynamic Planner (LLM-DP) refers to a family of planning architectures that leverage LLMs for dynamic plan generation, grounded execution, real-time adaptation, and sample-efficient task completion in complex, partially observable, or dynamic environments. These systems instantiate the LLM not merely as a passive instruction follower but as an active planning module—generating, refining, and grounding action sequences based on high-level goals, environmental feedback, and real-time observations. LLM-DPs typically employ mechanisms such as hierarchical planning, few-shot in-context prompting, dynamic re-planning, and tight integration with perceptual modules, bridging the gap between abstract instruction following and robust, embodied task execution.

1. Hierarchical LLM-Based Planning Architectures

LLM-DPs generally adopt a hierarchical planning paradigm. At the high level, an LLM (such as GPT-3) is prompted with a natural language instruction and a small number of paired examples (instruction–plan pairs), typically selected using a BERT-based kNN retriever, to generate a sequence of subgoals or high-level actions, formalized as:

Lh=[g0,g1,,gT]L_h = [g_0, g_1,\ldots,g_T]

where each gig_i is a tuple (high-level action,object)(\text{high-level action},\, \text{object}) (e.g., {PickupObject, potato}\{\text{PickupObject},~\text{potato}\}).

At the low level, a controller (often adapted from state-of-the-art baselines such as HLSM) takes each subgoal gig_i and produces a sequence of primitive action steps conditioned on the current environment state, thus realizing the high-level plan within the physical or simulated domain. This low-level plan LL_\ell satisfies:

P(LI,Lh,E)=P(LLh,E)P(L_\ell \mid I, L_h, E) = P(L_\ell \mid L_h, E)

for instruction II and environment EE.

This two-level approach enables LLM-DPs to decompose long-horizon tasks into tractable steps, accommodate variable contexts, and facilitate sample-efficient transfer to new domains with minimal paired demonstration data.

2. Few-Shot In-Context and Dynamic Planning

A distinguishing feature of modern LLM-DPs, exemplified by LLM-Planner (Song et al., 2022), is their reliance on few-shot in-context learning. Rather than retraining or fine-tuning, the LLM is guided during inference with a handful of diverse task-plan pairs, dynamically selected to match the current task. These examples function as “planning primitives,” exploiting the LLM’s embedded commonsense and compositional priors for rapid generalization.

This approach sharply contrasts with traditional planners, which require extensive paired trajectories or heavily engineered symbolic planners. The LLM, operating in the few-shot regime, thus demonstrates significant sample efficiency: competitive performance is maintained on complex multi-step tasks (tested on benchmarks such as ALFRED) using less than 0.5% of the original training data.

Furthermore, LLM-DP frameworks often incorporate dynamic re-planning, wherein the LLM receives real-time environmental feedback during execution. If the agent stalls, fails, or encounters an unexpected scenario, the current state (including detected objects, history of completed goals, etc.) is injected into the prompt and the high-level plan is regenerated or updated. This enables continual plan refinement and robust adaptation to dynamic, partially observable, or error-prone settings.

3. Physical Grounding and Environmental Feedback

Grounding high-level plans in the physical world is a fundamental challenge for embodied LLM-DPs. This is addressed via dynamic feedback loops that close the gap between planned intent and executed action. Key techniques include:

  • Grounded Re-Planning: The agent monitors progress and, upon subgoal failure or exceeding action thresholds, re-prompts the LLM with updated lists of observed or visible objects, thus ensuring the plan remains contextually relevant.
  • Context Injection: Environmental state is encoded as structured input (e.g., a list of visible objects produced by an onboard object detector) and included in the next LLM prompt, e.g., if “Pickup potato” is required but only a fridge is seen, the LLM replans to navigate to and open the fridge.
  • Closed-Loop Execution: Execution history and sensor outputs are iteratively processed, and observations are fed forward into subsequent planning steps, enabling online plan correction.

This tight integration of perception and generation is critical for achieving robust, real-world performance, especially under partial observability and object discovery.

4. Comparative Performance and Efficiency

Extensive evaluation on challenging datasets (notably ALFRED) demonstrates that LLM-DPs reach or closely approach the performance of extensively trained baselines, with orders of magnitude less labeled data. For example, LLM-Planner, augmented with dynamic re-planning and a strong low-level controller, matches the success rates of FILM and HLSM, which are trained over the full dataset, while baselines trained with the same limited data are largely ineffective.

Moreover, dynamic re-planning yields tangible efficiency and robustness gains (e.g., +1–2% success rate improvement on unseen splits). LLM-DPs also surpass approaches that rely on ranking predefined skills (SayCan) or policies that merely mimic prompts, particularly in partially observable, incrementally revealed environments.

The following table summarizes reported comparative results:

System Training Data Used Success Rate (Few-Shot Setting)
LLM-Planner+HLSM <0.5% paired Competitive with full-data FILM/HLSM
Baseline retrained 100 examples Near zero
SayCan Full env. details Lower than LLM-Planner

5. Broader Implications and Limitations

The integration of LLMs as sample-efficient, dynamic planners marks a paradigm shift for embodied planning systems. The LLM-DP approach demonstrates:

  • Versatility: A single LLM-driven planner can generalize across a space of tasks with little task-specific engineering.
  • Adaptivity: Dynamic, grounded re-planning enables resilience in real-world, partially observable, and error-prone settings.
  • Low Data Regime Competitiveness: Near state-of-the-art results are achievable in low-data scenarios, reducing the need for expensive paired supervision.

However, several practical limitations remain:

  • Dependence on Perceptual Modules: Success is bounded by the reliability of upstream perception (object detectors, semantic mapping).
  • Prompt Sensitivity: The prompt design and context selection are substantially influential; small variations can lead to divergent outcomes.
  • Latency and Cost: Dynamic re-prompting, particularly in real-time or resource-constrained environments, may introduce overhead.
  • Low-Level Integration: There is a continual need to improve the handshake between high-level LLM plans and robust low-level execution modules, particularly in complex robotics or manipulation tasks.

Future directions include refined prompt engineering, exploration of alternative LLM architectures (e.g., Codex or domain-specific models), and improved low-level controller design.

6. Formalization and Algorithmic Summary

The LLM-DP methodology can be formally captured as follows:

High-Level Plan Generation

Given instruction II and environment EE, the LLM generates:

Lh=[g0,g1,...,gT],gi=(action, object)L_h = [g_0, g_1, ..., g_T],\quad g_i = (\text{action},~\text{object})

Grounded Re-Planning Algorithm

The dynamic planning loop, as formalized in the original pseudocode, is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Input: Instruction I, observed object list O, completed subgoals G

S = LLM_Planner(I, O, G)  # Generate initial high-level plan
t = 0
k = 0
s = S[k]
a_t = Low_Level_Planner(s)

while k < len(S):
    execute(a_t)
    O_t = Object_Detector(current_view)
    update(O, O_t)
    if (subgoal s fails) or (t exceeds threshold):
        S = LLM_Planner(I, O, G)  # Re-plan grounded on observed objects
        k = 0
        s = S[k]
    elif (subgoal s succeeded):
        k += 1
        s = S[k]
    t += 1
    a_t = Low_Level_Planner(s)

This process ensures that the planner dynamically revises its course as the environment evolves, maintaining coherence with both high-level intent and physical constraints.


In summary, the LLM Dynamic Planner operationalizes the strengths of modern LLMs within hierarchical, sample-efficient, and environmentally grounded planning architectures, providing a robust framework for embodied, adaptive agents capable of complex real-world task execution (Song et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Large Language Model Dynamic Planner (LLM-DP).