Papers
Topics
Authors
Recent
Search
2000 character limit reached

Hybrid LLM+PDDL Planning

Updated 21 February 2026
  • The paper introduces a hybrid approach that combines LLMs' linguistic reasoning with symbolic PDDL planning to significantly enhance task synthesis and validation.
  • It employs modular architectures such as sequential NL-to-PDDL translation, agentic loops, and retrieval-augmented generation to robustly decompose and refine complex planning tasks.
  • Evaluation results demonstrate marked improvements, with plan success rates rising from as low as 15% to up to 100% and average plan lengths reduced by 45% in various domains.

Hybrid LLM+PDDL Planning refers to frameworks and algorithms that integrate LLMs with symbolic task planning based on the Planning Domain Definition Language (PDDL). Such systems leverage the linguistic and semantic reasoning capacity of LLMs for task formalization and modeling, while delegating structured, long-horizon plan synthesis and verification to classical PDDL planners. This hybridization aims to combine the generalization and convenience of LLMs with the rigor, correctness, and optimality guarantees of symbolic planning. Below, key methodologies, architectures, evaluation results, and open challenges in this line of research are summarized, with technical precision suitable for advanced researchers.

1. Architectural Principles and System Design

Hybrid LLM+PDDL planning systems characteristically decompose the planning pipeline into a sequence of modular stages, allowing LLMs to perform semantically complex, language-intensive tasks and utilizing symbolic planners for search, validation, and plan extraction. Architectures may adopt sequential, agentic, or cascaded models, such as:

  • Sequential NL→PDDL→Plan: The LLM translates English task descriptions to PDDL domain/problem files; a classical planner computes a plan, which is optionally translated back to natural language (Liu et al., 2023, Gestrin et al., 2024, Benyamin et al., 16 Sep 2025).
  • Agentic Loops: Iterative self-refinement or orchestrator-directed looping involves multiple LLM "agents" (translators, validators, ambiguity resolvers) and tight integration with external verifiers and planners. Plan outputs are verified, and failure feedback is routed back for repair and further NL-to-PDDL translation (Malfa et al., 10 Dec 2025).
  • RAG+CoT Pipelines: Retrieval-augmented generation (RAG) retrieves contextually relevant examples; chain-of-thought (CoT) reasoning steps are embedded in prompts to decompose semantics before symbolic generation and validation (Huang et al., 17 Sep 2025).
  • Neurosymbolic Feedback/Refinement Loops: Simulation or environment interaction (exploration, partial execution) is used to diagnose and correct LLM-generated symbolic models via measured feedback signals (Mahdavi et al., 2024, Gong et al., 19 May 2025).
  • Hierarchical Decomposition: For multi-robot and multi-agent environments, architectures may first produce team-level symbolic plans and then resolve task-to-agent assignment via combinatorial optimization or knowledge graph guidance (Shek et al., 4 Feb 2026, Shi et al., 26 Oct 2025).

This modularity allows fail-safes—invalid or underspecified symbolic models are automatically detected and repaired via either validation modules or human-in-the-loop correctors.

2. LLM-to-PDDL Translation Methodologies

LLM-driven PDDL synthesis involves tightly constructed prompt templates, multi-stage reasoning, and often in-context retrieval:

  • Prompting Strategies:
  • Validation Feedback Loops:
  • Environment-Grounded Refinement:
  • Formal PDDL Model Structure:
    • PDDL files generated are required to conform to classic STRIPS or numeric planning paradigms:
    • D=⟨T,P,A⟩D = \langle T, \mathcal{P}, \mathcal{A} \rangle: types, predicates, actions
    • P=⟨O,I,G⟩P = \langle O, I, G \rangle: objects, initial state, goal formula (Gestrin et al., 2024, Huang et al., 17 Sep 2025).

3. Planning and Validation Integration

Once PDDL models are synthesized, classical planners are invoked for plan generation. This ensures formal correctness:

4. Advanced Forms: Multi-Robot, Online, and Generalized Planning

Hybrid LLM+PDDL approaches have been extended to higher-complexity or real-world planning scenarios:

  • Multi-Robot Planning:
    • Symbolic planning is performed at the team level, with PDDL plans mapped to task-dependency graphs; integer programming is used for robot-level task assignment, achieving improved utilization and efficiency over prior LLM-based methods (Shi et al., 26 Oct 2025).
    • Knowledge graph-guided frameworks maintain a dynamically updated memory encoding object relations, robot capabilities, and spatial reachability; failures trigger replanning and KG refinement via LLMs, dramatically improving task completion rates in heterogeneous agent settings (Shek et al., 4 Feb 2026).
  • Partial Observability and Environment-Driven Modeling:
    • PDDL formalization and planning are executed in online or partially observable environments, using dual feedback loops from both symbolic solvers and environmental simulation to iteratively grow and refine the domain and problem representation (Gong et al., 19 May 2025, Mahdavi et al., 2024).
  • Generalized Plan Synthesis:
    • LLMs produce domain-generalized strategies as pseudocode or Python programs, which are automatically debugged and reflected upon via validation feedback, improving generalized plan coverage across PDDL instances (Stein et al., 19 Aug 2025).

5. Quantitative Evaluation and Empirical Results

Empirical studies across domains consistently demonstrate that hybrid LLM+PDDL systems outperform pure LLM plan synthesis, both in terms of task success rate and plan optimality:

Framework Domain(s) Baseline Success Hybrid Success Notable Gains
LLM+P (Liu et al., 2023) 7 PDDL domains ≤15% ≥85–100% Guarantees optimality with correct PDDL encoding
NL2Plan (Gestrin et al., 2024) Blocksworld, ISR, etc. 2/15 10/15 Reports explicit failure if unsolvable
SPAR (Huang et al., 17 Sep 2025) UAV multi-domain 81% (Format) 95.2% (Ours) High executability, feasibility, and interpretability
KGLAMP (Shek et al., 4 Feb 2026) MAT-THOR (multi-robot) 51% (prev. SOTA) 73% +25.5 pp over best prior, with robust replanning
PIP-LLM (Shi et al., 26 Oct 2025) AI2-THOR, Gazebo 0–93% 70–100% Scalable to large teams and object sets
Hive (Vyas et al., 2024) Multi-modal MuSE 73% (HuggingGPT) 92% Perfect constraint adherence; best model selection

Plan cost is frequently reduced (e.g., average plan length cut by 45% with optimal search) (Malfa et al., 10 Dec 2025), and constraint adherence and explainability are improved via formalization.

6. Limitations, Challenges, and Future Directions

Several open challenges are prominent:

7. Broader Impact and Application Domains

Hybrid LLM+PDDL systems unlock programmatic, explainable task planning for users lacking symbolic modeling expertise, accelerating application in robotics, logistics, multi-modal workflow orchestration, and real-world agentic AI. By leveraging LLMs for translation and model construction, and symbolic planners for formal reasoning and executability, these systems provide a rigorous, transparent, and robust paradigm that is rapidly advancing the boundaries of automated planning capabilities (Liu et al., 2023, Malfa et al., 10 Dec 2025, Gestrin et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hybrid LLM+PDDL Planning.