Graph-then-Plan Inference Framework

Updated 4 May 2026

Graph-then-plan Inference Framework is a two-phase method that constructs a task graph before executing explicit planning, separating abstraction from reasoning.
The approach enhances performance by enabling subtask concurrency, reducing error propagation, and enforcing global constraints in complex systems.
Empirical results show improvements such as 50% higher retrieval accuracy and up to 12.9× cost reduction, making it effective for LLM scheduling and multi-hop reasoning.

A graph-then-plan inference framework is a methodological paradigm in which a problem instance is first represented as a graph—encoding entities, dependencies, constraints, or relational information—before explicit planning, search, or policy generation is performed over this graph structure. This separation of abstraction (graph construction) and subsequent reasoning or planning (plan synthesis or execution) enables explicit modeling of dependencies, parallelizable scheduling, modular system design, and, in recent neural and symbolic systems, improved robustness and efficiency. The approach is now employed in LLM agent scheduling, multi-hop retrieval-augmented generation, knowledge-graph reasoning, program synthesis, robot exploration, and tool-augmented agent planning.

1. Formalization Across Domains

Across canonical works, the graph-then-plan formulation encompasses a two-phase pipeline:

Graph Construction: Input data (usually a complex task, structured query, environment state, or artifact) is mapped into a graph $G=(V,E)$ , capturing subtask decomposition, dependencies, relational structure, or spatial/semantic zones.
Planning/Reasoning: The planning module accepts $G$ and outputs an explicit plan $P$ (set of actions, traversals, operations, or schedules), optimizing one or more objective functions subject to the graph-imposed constraints.

Key instantiations:

Plan-over-Graph (LLM Scheduling): High-level textual task $T$ is decomposed by LLM into an abstracted task DAG $G=(V,E)$ , where $V$ are subtasks and $E$ are precedence edges. The plan is a subgraph $P\subseteq G$ connecting sources to goals under makespan minimization, possibly regularized by cost (Zhang et al., 20 Feb 2025).
Retrieval over Knowledge Graphs: For RAG or knowledge-domain reasoning, a graph $G$ encodes typed nodes/relations; the planner synthesizes a sequence of traversal actions yielding a minimal informative subgraph for downstream answer generation (GraphRunner) (Kashmira et al., 11 Jul 2025).
Program Synthesis via Graph Abstraction: Graphs representing structured abstractions (e.g., object-centric in ARC) permit constraint-based planning at the graph level, where DSL programs act as transformation sequences over $G$ (Xu et al., 2022).
Reasoning/Memory Graphs: Evidence-centric chain-of-thought graphs persist agent judgments over passages, supporting feedback-driven, self-improving answer pipelines (Penaroza, 8 Apr 2026).
Physical Planning and Spatial Layouts: Layout, occupancy, or region graphs inform policy or generation modules in robot exploration or floorplan synthesis, integrating spatial inference before planning (Che et al., 24 Sep 2025, Hu et al., 2020).

2. Representative Algorithms and Pipelines

The graph-then-plan paradigm admits diverse algorithmic realizations, all sharing a split between structure induction and plan generation.

Table: Core Stages Across Exemplars

Domain	Stage 1: Graph Construction	Stage 2: Planning/Reasoning
LLM Task Scheduling	T → G: LLM decomposes task into subtask DAG	G → P: LLM/adapter outputs parallelizable plan P
Retrieval (GraphRunner)	q → G: Structured knowledge graph	Plan: LLM outputs multi-hop plan; verify, then exec
ARC Program Synthesis	Pixel grid → object graph(s)	Plan: Search for programs in DSL over G
Robot Exploration	Occupancy map → region-evaluation graph	Diffusion policy over inferred graph
Tool Planning	Tool+domain schemas → fused dependency graph	Subgraph retrieval; LLM plans with rationale
Reasoning Memory	Runs → reasoning/retrieval graph (persisted)	Traversal-based pipeline planning for feedback

Detailed approaches:

Plan-over-Graph: Natural-language prompt or task description is parsed by an LLM (using CoT prompts and rule extraction) into a JSON-formatted task graph; a LoRA-adapted LLM then plans over graph input, trained under SFT and DPO with synthetic pretraining and real-task fine-tuning (Zhang et al., 20 Feb 2025).
GraphRunner: LLM in a single prompt generates a holistic traversal plan over a knowledge graph; the verifier ensures action validity and graph compatibility; the executor performs high-level (including multi-hop) traversal actions (Kashmira et al., 11 Jul 2025).
ARGA (ARC): Graph abstractions of images constructed via component extraction; planning is constraint-driven program synthesis in a DSL over graphs; constraint acquisition, best-first search, and hashing/tabu pruning optimize combinatorial search (Xu et al., 2022).
Plan*RAG: Multi-hop QA infers a DAG of subqueries explicitly; execution proceeds depth-by-depth in parallel, with on-demand atomic retrieval and per-node answer composition (Verma et al., 2024).

3. Advantages Over Sequential or Iterative Methods

Empirical and theoretical analysis repeatedly demonstrate the superiority of explicit graph-then-plan pipelines compared to monolithic or sequential reasoning:

Parallelism: Explicit graph representations admit subtask concurrency; e.g., Plan-over-Graph demonstrates substantial makespan reduction (parallel vs. sequential ratios $G$ 0 for $G$ 1) (Zhang et al., 20 Feb 2025).
Error Reduction: Planning over graphs enables global constraint enforcement, reducing mid-chain hallucination and accumulation of model errors (GraphRunner achieves $G$ 2– $G$ 3 performance improvement and $G$ 4– $G$ 5 cost reduction over iterative single-hop methods) (Kashmira et al., 11 Jul 2025).
Context Control: State and dependency information are persisted in the graph, decoupling reasoning from limited-length context windows responsible for context bloat and failure in context-only LLM chains (Verma et al., 2024).
Composite Feedback and Learning: Persistent graph memory in reasoning graphs supports evidence-centric, backward traversal for fine-grained feedback, yielding provable accuracy convergence and variance collapse without model retraining (Penaroza, 8 Apr 2026).

4. Graph Representation, Objectives, and Learning

The structural properties of the graph and the objectives of the planning stage are domain-adapted but structurally analogous:

Graphs: DAGs with nodes as subtasks, regions, objects, tools, or evidence; edges encode ordering, precedence, spatial, or semantic relations; attributes annotate costs, times, or features (Zhang et al., 20 Feb 2025, Kashmira et al., 11 Jul 2025, Xu et al., 2022, Hu et al., 2020).
Planning Objective: Formally, plans $G$ 6 are subgraphs or sequences optimally connecting initial and goal nodes under makespan, cost, or accuracy metrics, subject to prerequisites or constraints; various loss functions including cross-entropy, DPO, and program synthesis-specific heuristics are employed (Zhang et al., 20 Feb 2025, Xu et al., 2022).
Learning: Synthetic graph generation (random/tree-based DAGs), constraint acquisition, hashing, and task-specific pretraining ground the model’s graph-reasoning capacity and structural bias, especially when scaling beyond the comprehension range of direct large-language-model inference (Zhang et al., 20 Feb 2025, Xu et al., 2022).

5. Empirical Results and Benchmarks

Experimental validations highlight robust performance and scaling:

Plan-over-Graph: Llama-trained models achieve optimal rate (OR) of $G$ 7 and success rate (SR) of $G$ 8 on real tasks, far surpassing API-based LLMs in optimal scheduling, especially for large graphs (maintain OR $G$ 9 even for $P$ 0 where zero-shot LLMs degrade to OR~ $P$ 1) (Zhang et al., 20 Feb 2025).
GraphRunner: Achieves up to $P$ 2 relative improvement in retrieval accuracy over Graph-CoT, with $P$ 3– $P$ 4 token cost reduction and $P$ 5– $P$ 6 faster response times (Kashmira et al., 11 Jul 2025).
ARGA (ARC): Solves $P$ 7 of test tasks with a $P$ 8-fold reduction in combinatorial search compared to baselines, with constraint acquisition and hashing providing significant further speed-ups (Xu et al., 2022).
Plan*RAG: On multi-hop benchmarks, Plan*RAG outperforms Self-RAG by $P$ 9– $T$ 0 F1 points and reduces retrieval calls by $T$ 1, with further ablation showing graph planning accounts for a $T$ 2-pp accuracy gain (Verma et al., 2024).

6. Limitations and Future Directions

Current graph-then-plan frameworks face several open frontiers:

Static Graph Limitation: Most pipelines assume static graph representations; real-world applications may require online graph updates, dynamic replanning, or closed-loop perception-action cycles (Zhang et al., 20 Feb 2025).
Expressivity Constraints: Existing representation schemes may not capture higher-order constraints (non-adjacency, resource bounds) critical for complex or stochastic domains (Hu et al., 2020).
Integration with Subgraph Neural Modules: Tight integration of learned subgraph embeddings (e.g., GNNs) may enhance plan induction in highly structured settings (Zhang et al., 20 Feb 2025).
Scalability: Ultra-large graphs present computational and retrieval bottlenecks, potentially necessitating partitioning, sampling, or adaptive pruning (Liu et al., 28 Oct 2025).
Cross-Domain Adaptation: Multimodal and cross-domain scenarios call for more general graph induction and planning machinery, including mixed-initiative user feedback, multi-agent coordination, and explicit uncertainty modeling (Che et al., 24 Sep 2025).

7. Contextualization Within Automated Reasoning and AI Planning

The graph-then-plan paradigm has deep roots in classical symbolic planning (e.g., Graphplan as dynamic CSP with EBL/DDB/flexible variable ordering (Kambhampati, 2011)) and has evolved through contemporary program synthesis, retrieval-based question answering, and agentic multimodal reasoning. Its re-emergence in modern LLM-centric systems reflects the necessity of externalizing state and dependencies for computational tractability, modular reasoning, and robust parallelism, especially as architectural context windows saturate and task complexity escalates. This framing is expected to remain foundational in future cognitive AI and agentic system design.