Zero-Shot Planning: Unseen Task Synthesis
- Zero-shot planning is a paradigm that synthesizes action policies for novel tasks without in-domain demonstrations, relying on pre-acquired skills and logical reasoning.
- Key methodologies include constructing modular skill libraries, decomposing formal specifications, and employing metacognitive loops for iterative plan refinement.
- Empirical results and theoretical guarantees demonstrate its effectiveness across robotics, navigation, and formal logic tasks, while highlighting challenges in scalability and real-world deployment.
Zero-Shot Planning
Zero-shot planning refers to the synthesis of policies, action sequences, or plans to solve previously unseen tasks or satisfy novel specifications—without drawing on any in-domain demonstrations, fine-tuning, or task-specific examples. The essential requirement is that all necessary task- or domain knowledge must be acquired beforehand, and generalization must occur exclusively via recombination, reasoning, or inference at deployment time. Zero-shot planning thus contrasts with few-shot and supervised paradigms, which rely on in-domain samples or end-to-end training per task.
1. Core Definitions and Problem Formulations
Zero-shot planning encompasses a diverse array of settings, unified by the following defining criteria:
- Unseen-Task Generalization: At test time, the planner receives a task description, specification, or goal (e.g., in natural language or formal logic) drawn from a target task distribution disjoint from any previously encountered demonstrations or plan traces (Lin et al., 20 May 2025, Liu et al., 23 Jan 2025, Bergeron et al., 2024).
- No In-Task Demonstrations: The system is not permitted to see any examples, trajectories, or supervision directly pertaining to the target task.
- Utilization of Pre-existing Knowledge: The planner operates on background knowledge distilled from prior tasks, skills libraries, policies, symbolic models, or unsupervised data—but without adaptation or (re)-optimization on the held-out task.
- Planning Objective: Given a new instance (e.g., state, specification, or goal), synthesize a plan π that achieves the desired behavior according to some success predicate (e.g., task completion, temporal logic satisfaction, robustness, or optimality), typically
where denotes the previously acquired background or skill knowledge.
This definition subsumes a variety of domains, including classical AI planning, high-level robotic task and motion planning, reinforcement learning, temporal logic synthesis, and multi-agent collaboration (Lin et al., 20 May 2025, Liu et al., 23 Jan 2025, Bergeron et al., 2024, Yang et al., 12 Oct 2025, Hao et al., 2024, Gong et al., 19 May 2025).
2. Principled Methodologies and Algorithmic Patterns
2.1 Skill Library Construction and Modular Decomposition
A prevalent pattern in zero-shot planning is the construction of a library of modular, composable skills or policies, typically via offline mining from demonstration corpora, task-agnostic trajectories, or reinforcement learning. Each skill corresponds to a reusable sub-task, primitive, or temporally-extended behavior, annotated with a canonical description and pointer to one-shot exemplars (Lin et al., 20 May 2025, Bergeron et al., 2024). Zero-shot generalization is then enacted by selecting, sequencing, or composing these skills according to the present task specification, often guided by semantic similarity in language embedding space.
2.2 Logical Specification Decomposition
When tasks are specified in formal temporal logic—such as Signal Temporal Logic (STL) or Linear Temporal Logic (LTL)—zero-shot planners decompose the specification into a set of reachability, invariance, progress, or Boolean subgoals (Liu et al., 23 Jan 2025, Bergeron et al., 2024). Planning reduces to solving for time-aware waypoints, progress allocations, or policy compositions that satisfy all logical clauses, followed by trajectory or policy synthesis through safe diffusion models or automata-theoretic product constructions.
2.3 Metacognitive Reasoning and Self-Reflection
State-of-the-art zero-shot frameworks implement closed-loop metacognitive reasoning, inspired by human learning, in which an LLM- or VLM-based planner not only generates initial plans but also reflects on failures, diagnoses missing or mis-parameterized skills, and synthesizes creative alternative solutions (Lin et al., 20 May 2025, Li et al., 2023). This reflection can be triggered by explicit validation failures (e.g., collision, inverse kinematics violations), with the error code and original prompt provided as feedback. The system iteratively refines its plan until validation succeeds or a retry limit is reached.
2.4 Formal Model Induction and Symbolic Planning
Zero-shot planning under partial observability and unknown environment structure entails on-the-fly formalization of the world model—such as an approximate PDDL domain and problem description—through LLM-driven code or schema generation, iterative growth, and refinement in response to new observations or simulation errors (Gong et al., 19 May 2025, Hao et al., 2024). Symbolic planners (e.g., Fast Downward, SAT/SMT/MILP solvers) are then employed to synthesize and execute plans.
2.5 Hierarchical and Neurosymbolic Planning
Recent frameworks integrate hierarchical planning with vision-LLMs (VLMs) or LLMs to decompose complex manipulation or navigation tasks into sub-goals, which are then mapped to executable trajectories using geometry, keypoint, or hand pose extraction (Fu et al., 23 Feb 2026, Shin et al., 2024). Explicit scene graph construction and symbolic reasoning are also used to enable scalable adaptation in unseen domains (Bhatt et al., 23 Sep 2025).
3. Theoretical Guarantees and Empirical Performance
Theoretical soundness and empirical efficacy have been demonstrated in multiple workflows:
- Temporal Logic Satisfaction: Comp-LTL proves that after deterministic, ambiguity-eliminating pruning, its transition system TS* is realizable and, if the plan exists in the product automaton, the composed primitive policy sequence will satisfy arbitrary LTL formulas (Bergeron et al., 2024).
- Planning Optimality: LLMFP guarantees (when solvers succeed) that global optimal solutions are returned for multi-constraint or multi-step tasks, as all constraints—including implicit resource-flow and preconditions—are explicitly formalized and enforced (Hao et al., 2024).
- Zero-Shot RL: Forward-Backward (FB) representations produce universal policies that, given a reward function, can be extracted instantly and are empirically shown to reach 81–85% of supervised RL performance across diverse tasks solely from reward-free replay buffers (Touati et al., 2022).
Selected Quantitative Results
| Framework | Zero-Shot Success Rate (SR) | Baseline SR | Relative Gain | Domain | Source |
|---|---|---|---|---|---|
| Metacognitive LLM (RoCo) | 0.76 ± 0.10 (“Move Rope”) | 0.50–0.65 | 17–26 pts | Multi-robot rope, stacking, cabinets | (Lin et al., 20 May 2025) |
| Comp-LTL | 100% (novel LTL φ) | reward-machine (tuned) | +20% | High-dim gridworld, LTL compositional | (Bergeron et al., 2024) |
| Socratic Planner | 11.1% (zero-shot, ALFRED SR) | 5.7% | +5.4 pts | Long-horizon instruction following | (Shin et al., 2024) |
| FlowPlan (unseen test) | 35.6% (SR, ALFRED) | 9.9–18.9% | +17–25 pts | Real-world and simulated robots | (Lin et al., 4 Mar 2025) |
| VLN-Zero (R2R unseen) | 42.4% (SR) | 17–25.5% | +17–25 pts | Vision-language navigation | (Bhatt et al., 23 Sep 2025) |
| NovaPlan (4-layer stacking) | 70% (vs. 30% baseline) | 30% | +40 pts | Long-horizon physical manipulation | (Fu et al., 23 Feb 2026) |
4. Representative Algorithmic and Architectural Features
4.1 Modular Skill Extraction and Library Organization
- Text-embedding similarity () used for clustering skills.
- Libraries constructed to enable selection of semantically aligned skills at runtime.
- Pseudocode for clustering:
1 2 3 4 5 6 7 8 9 10
BuildLibrary(D, τ): ℒ ← ∅ for scene, demo in D: S ← LLM.ExtractSkills(scene, demo) for s in S: if ∃s′∈ℒ with sim(s,s′)>τ: continue else: ℒ.add((s, (scene,demo))) return ℒ
4.2 Closed-Loop Validation and Reflection
- Lightweight validators check for collisions, infeasibility.
- Upon failure, the LLM is prompted with the error, original context, and prior plan to diagnose and adjust the failed skill(s).
- Self-reflection loops until success or retry cap.
4.3 Logical Composition and Pruning
- Pruned automata product with Büchi automaton for LTL planning.
- Policy composition via .
5. Applications and Empirical Domains
Zero-shot planning has been investigated in several high-stakes and compositional settings:
- Robotics manipulation: Modular LLM planning for simulated and real robots—rope routing, cabinet arrangement, sandwich making (Lin et al., 20 May 2025, Shin et al., 2024, Lin et al., 4 Mar 2025, Wang et al., 2024, Fu et al., 23 Feb 2026).
- Vision-language navigation: Scene graph exploration and symbolic planning for navigation from natural language (Liang et al., 2023, Bhatt et al., 23 Sep 2025).
- Medical treatment optimization: LLM-driven IMRT radiotherapy planning, matching or surpassing clinical plan quality in zero-shot (Yang et al., 12 Oct 2025).
- Procedural planning: Chain-of-thought and object-state CoT methods for multimodal instruction and recipe generation (Tabassum et al., 25 Sep 2025).
- Formal logic task synthesis: Zero-shot STL and LTL trajectory and policy planning (Liu et al., 23 Jan 2025, Bergeron et al., 2024).
- General-purpose symbolic/optimization planning: LLM-powered code generation and self-assessment for both classical and complex constrained planning (Hao et al., 2024, Gong et al., 19 May 2025).
6. Strengths, Limitations, and Future Directions
6.1 Strengths
- True zero-shot generalization: Demonstrated capability to solve novel tasks via decomposition, composition, and self-refinement based only on pre-learned modules, logic, or skills (Lin et al., 20 May 2025, Hao et al., 2024, Bergeron et al., 2024).
- Interpretability and reusability: Modular libraries, formal symbolic representations, and explicit planning loops provide transparency and reusability across tasks (Gong et al., 19 May 2025, Hao et al., 2024).
- Closed-loop robustness: Iterative self-diagnosis and fine-grained validator intervention improve performance beyond static prompt-based methods (Lin et al., 20 May 2025, Li et al., 2023).
6.2 Limitations
- Scalability: Several approaches rely on costly LLM/VLM calls at each replan or validation step, and symbolic planners may struggle with very large state-action spaces (Gong et al., 19 May 2025, Lin et al., 20 May 2025).
- Partial observability: While some frameworks extend to partially observable domains via iterative formalization and growth, robust handling of latent or ambiguous world state remains challenging (Gong et al., 19 May 2025).
- Physical grounding: Most frameworks are validated in simulation; real-world deployment requires bridging perception noise, real-time constraints, and physical consistency (Lin et al., 20 May 2025, Fu et al., 23 Feb 2026).
- Complex multi-agent and hierarchical dependencies: Current reflection mechanisms rarely capture deep multi-agent coordination or long-horizon dependency graphs (Lin et al., 20 May 2025).
6.3 Research Frontiers
- Hierarchical failure diagnosis and deeper metacognitive loops
- Better integration with trajectory optimizers, geometric planners, and low-level controllers
- Rapid online adaptation and few-shot enhancement over zero-shot baselines
- Extension to broader domains: financial portfolio optimization, large-scale logistics, or stochastic/adversarial planning
Zero-shot planning therefore provides a unifying formalism and experimental paradigm for rapid, scalable generalization in planning and control settings, leveraging modular skill composition, logical reasoning, and self-reflective refinement to approach, and in some domains match, the efficacy of fully trained or supervised approaches (Lin et al., 20 May 2025, Hao et al., 2024, Liu et al., 23 Jan 2025, Bergeron et al., 2024).