PDDL-Instruct: Logical CoT Framework

Updated 23 September 2025

PDDL-Instruct is a framework that enables LLMs to perform symbolic planning using logical chain-of-thought reasoning and explicit validation steps.
It employs a two-stage instruction tuning protocol that combines annotated planning sequences with iterative feedback from external validators to refine plan accuracy.
The framework has demonstrated marked improvements in planning benchmarks, significantly raising valid plan rates in complex domains.

The PDDL-Instruct Framework refers to a modern approach for instruction tuning LLMs for robust, explainable, and logically precise symbolic planning with representations such as the Planning Domain Definition Language (PDDL). Grounded in logical chain-of-thought (CoT) reasoning, the framework specifically aims to overcome the gap between the general, statistical language capabilities of LLMs and the formal reasoning rigor required in automated planning domains. It systematically guides models to verify action applicability, state transitions, and invariant satisfaction through explicit logical reasoning, enabling iterative self-correction and providing strong empirical improvements for a diverse set of planning benchmarks (Verma et al., 14 Sep 2025).

1. Core Methodology and Architecture

PDDL-Instruct consists of a two-stage instruction tuning protocol:

Initial Instruction Tuning: Pre-trained LLMs are exposed to planning problems and corresponding solutions (plans), with detailed natural language explanations provided for each step—explaining why actions are applicable or inapplicable, how preconditions are satisfied, effects applied, and invariants preserved or violated. Both valid and invalid planning sequences are included for contrastive supervision.
Logical Chain-of-Thought (CoT) Instruction Tuning: Models are prompted to autonomously generate explicit CoT reasonings that decompose the planning process into consecutive symbolic states and actions:

$\left\langle s_0, a_1, s_1 \right\rangle, \left\langle s_1, a_2, s_2 \right\rangle, \ldots, \left\langle s_{n-1}, a_n, s_n \right\rangle$

At each step, the LLM justifies action applicability (precondition satisfaction), traces effects, and ensures that invariants are maintained. Each CoT step is externally validated by a plan validator (VAL), which flags both local reasoning errors and global plan invalidity. The resulting feedback—whether binary (valid/invalid) or fine-grained (specific error types)—is iteratively integrated into the prompt, enabling the model to refine its reasoning and planning outputs.

Formally, reasoning steps are tuples $z_i = (s_i, j_i, u_i)$ , with $s_i$ the current state, $j_i$ the logical justification of a step, and $u_i$ an (optional) uncertainty estimate. The chain satisfies logical progression:

$\exists j_{i-1} : j_{i-1}(s_{i-1}) \implies s_i$

The CoT tuning objective is divided into:

Reasoning Chain Optimization:

$\theta_t^r = \theta_t - \delta_1 \nabla_{\theta_t} \mathcal{L}_{\text{reasoning}}(\theta_t, \mathcal{D}_{\text{reasoning}}^t)$

Plan Output Optimization:

$\theta_{t+1} = \theta_t^r - \delta_2 \nabla_{\theta_t^r} \mathcal{L}_{\text{final}}(\theta_t^r, \mathcal{D}_{\text{final}}^t)$

with $\theta_t$ parameters at iteration $t$ , and $\delta_1,\delta_2$ respective learning rates.

2. Logical Reasoning and Verification

PDDL-Instruct explicitly enforces logical fidelity at each planning step. For every action, the model must:

Check (and explain) that all preconditions are satisfied in the current symbolic state
Apply the action’s effects, obtaining a new symbolic state
Validate that invariants (such as mutual exclusion, resource limits, or goal progression) are not violated
Justify every transition with a clear, auditable logical argument

The external VAL validator operates after each CoT-generated plan, returning a detailed verdict. Invalidities (unsatisfied preconditions, missing effects, or invariant violations) trigger tailored feedback, resulting in iterative corrections up to a predefined maximum number of self-explanations (parameter $\eta$ ; typical values are 10 or 15).

The model is trained not just to generate valid plans, but to iteratively reflect and improve via this process—a structure analogous to abductive or deductive steps in classical logic.

3. Instruction Tuning Protocol

The training data is organized into two types:

Phase 1: Example planning domains, problems, and annotated solutions (valid and invalid plans) with explanations for each step. The model learns to explain or diagnose plan validity.
Phase 2 (CoT augmentation): For each planning problem, the model receives prompts to produce state-action-state chains. These are verified with external VAL; any detected errors yield feedback incorporated in a new prompt in subsequent iterations.

The protocol is summarized below:

Phase	Input	Model Output	Feedback Source
Initial Tune	(domain, problem)	Plan + natural language validity explanations	Human annotations
CoT Instruction	+CoT prompt	State-action-state traces + justifications	VAL validator output

Iterative feedback cycles continue until either a valid solution is formed or a maximum attempt count ( $\eta$ ) is reached.

4. Empirical Results and Benchmarking

PDDL-Instruct achieves significant empirical improvements in structured planning tasks:

Blocksworld: On Llama-3, baseline (untuned) models achieve $28\%$ validity, initial instruction tuning $78\%$ , and full CoT+VAL feedback $94\%$ .
Complex domains (Mystery Blocksworld, Logistics): Baseline performance as low as $1\%$ , lifted to $64\%$ and $79\%$ respectively after full tuning.
Comparison against binary vs. detailed feedback: Fine-grained feedback consistently yields higher planning accuracies.
Feedback iterations: Increasing $\eta$ (e.g., from $10$ to $15$) further raises accuracy.

Strong results are observed across both open-source (Llama-3) and API-based (GPT-4) LLMs. The iterative, chain-of-thought tuning mechanism robustly scales to diverse domains, long-horizon plans, and complex goal structures.

5. Application Domains and Implications

The framework is directly applicable to:

Autonomous robotics and control, where high-confidence, explainable planning is critical—for instance, manipulation, navigation, and resource scheduling tasks.
Safety-critical settings (e.g., autonomous driving, healthcare robotics) requiring plan explainability and formal verifiability.
General symbolic reasoning and decision-making, by bridging LLM generalization with classical planning rigor.

A key implication is the feasibility of deploying LLMs as reliable, auditable high-level planners, where each step of the decision process is transparently explained and validated post-hoc. This reduces the epistemic gap that typically divides data-driven models and formal symbolic planners.

6. Limitations and Future Directions

Areas identified for further improvement include:

Advanced PDDL coverage: Extending reasoning to encompass conditional effects, cost-sensitive planning, temporal constraints, and richer logical constructs.
Optimality guidance: Incorporating plan quality metrics to encourage minimal or cost-optimal plans, rather than satisficing ones.
Self-verification: Enabling models to internally check validity, reducing dependence on external validators.
Data selection: Optimizing selection and construction of instruction tuning data to better capture problem variability and challenging error cases.
Dynamic iteration control: Modulating the feedback refinement limit ( $\eta$ ) adaptively based on problem hardness and observed learning dynamics.

Potential future research directions also include integrating PDDL-Instruct with hybrid neural-symbolic pipelines and broader deployment across real-world, dynamically changing environments.

7. Significance and Relation to Broader Planning Systems

PDDL-Instruct advances the field by making logical chain-of-thought planning—previously a bottleneck for LLMs—a tractable, scalable, and interpretable process. It complements template-based, feedback-driven planning pipelines (Kagitha et al., 20 May 2025, Mahdavi et al., 2024, Nabizada et al., 2024), but with sharper focus on explicit logical breakdowns and self-verification cycles. Its methods may be fruitfully combined with automated PDDL domain generation and grounding techniques demonstrated by frameworks such as SPAR (Huang et al., 17 Sep 2025) and IALP (Wang et al., 11 Mar 2025), consolidating the intersection of LLM generalization, formal reasoning, and classic automated planning.

In summary, PDDL-Instruct provides a structured paradigm for enhancing LLM-based planning: by emphasizing logical CoT reasoning, external feedback, and iterative plan refinement, it moves the field toward more transparent, robust, and practically applicable AI planning systems (Verma et al., 14 Sep 2025).