- The paper presents a novel NSI framework that lifts raw traces into modular logic programs, enabling agents to reason about when and why to act.
- It combines neural perception with symbolic execution, incorporating conditional branches and dynamic variable binding to adapt to environmental changes.
- Empirical results demonstrate significant improvements, with up to 98% success on ALFWorld and robust performance across multiple domains.
Neuro-Symbolic Skill Induction for Long-Horizon Agentic Tasks
Motivation and Problem Statement
Existing LLM-based agentic systems frequently fail during long-horizon task execution due to transient, state-blind strategies that do not robustly encode environmental contingencies. Traditional skill induction approaches distill experience into parameterized scripts, but these lack conditional logic essential for dynamic adaptation. Neuro-Symbolic Skill Induction (NSI) introduces a trace-to-logic lifting mechanism that converts raw interaction traces into modular, logic-grounded programs. NSI synthesizes explicit symbolic control flows and enables dynamic variable binding, allowing agents to reason about "when" and "why" to act, in contrast to the state-blind nature of previous paradigms.
Figure 1: NSI transforms traces into logic-grounded workflows, enabling generalization via explicit state predicates and control flow.
NSI Framework Architecture
The NSI framework comprises three synergistic mechanisms: NeSy grounding, offline skill induction, and online skill evolution. NeSy grounding maps raw environmental observations—a product of neural perception—into a logical execution space governed by First-Order Logic (FOL). Offline induction abstracts successful demonstrations into reusable skills via modular synthesis, forming a library of logic-grounded programs. Online evolution leverages interaction feedback through a reflective planner to correct feasibility conditions and modularly compose recovery trajectories, honing skills dynamically as deployment progresses.
Figure 2: NSI architecture: neural-symbolic grounding, offline skill induction, and online evolution for state-aware skill growth.
Neuro-Symbolic Skill Representation
NSI formalizes skills as triples: invocation parameters, a neural grounding module, and a symbolic execution graph. Neural modules parse environmental feedback into structured symbolic states, while symbolic interpreters deterministically execute modular workflows based on grounded predicates. The system separates perception from control flow, allowing conditional branching, iterative logic, and dynamic variable binding. Key structural operators include DataOps (dynamic variable binding), ControlOps (decision boundaries), PrimitiveOps (atomic action execution), and TerminalOps (feedback and diagnostics).
Induction and Consolidation Algorithms
Skill induction occurs in two stages: intra-trajectory logic consolidation and inter-trajectory structural merging. Locally, NSI synthesizes specialized program graphs for each demonstration, iteratively introducing conditional branches to maximize empirical consistency. Globally, it merges local experts via abstract variable lifting and modular crossover, enforcing Minimum Description Length (MDL) to prevent redundancy and overfitting. The process discovers critical control flow (branching, loops) and variable abstractions necessary for robust generalization across environmental heterogeneity.
Reflective Planning and Online Skill Evolution
Agents leveraging NSI execute logic-grounded skills, where failure at a symbolic node yields diagnostic feedback. Reflective planning utilizes these diagnostics to propose corrective plans; successful recovery trajectories are then grafted onto the failing skill graph. This self-evolution enables agents to "grow" modular recovery branches, incrementally increasing the coverage and resilience of skills while maintaining stable core logic. Failed recovery attempts restrict skill applicability via docstring updates, aligning descriptions with feasibility regions.
Figure 3: Representative cases of online skill evolution across ALFWorld, WebShop, and TextCraft.
Empirical Evaluation
NSI is evaluated on ALFWorld (embodied robotics), WebShop (web-based e-commerce), and TextCraft (compositional Minecraft crafting), utilizing GPT-4o as the backbone LLM. Against baselines including ReAct, Reflexion, ADaPT, StateAct, and state-of-the-art programmatic methods such as AWM and ASI, NSI demonstrates superior long-horizon generalization and execution robustness. NSI achieves up to 98.0% success on ALFWorld, 76.5% on WebShop, and 95.2% on TextCraft, consistently outperforming alternatives. Notable numerical gains include 100%–140% improvement in atomic steps per skill invocation, with no performance collapse beyond 22-step horizons characteristic of linear script-based agents.

Figure 4: NSI compresses planning horizons and sustains robust execution in long-horizon tasks.
Structural Impact and Horizon Compression
Analysis of skill structure reveals that NSI's modular logic internalizes complex behaviors, efficiently compressing planning horizons. Each skill encapsulates ∼7.4 atomic steps, maintaining high semantic abstraction and preventing micro-management failures. Survival analyses show NSI's resilience in extended interaction sequences (>50 steps), while baseline programs collapse due to cumulative reasoning errors. Modular skill graph expansion, enabled by reflective planning, transforms runtime failures into conditional logic branches, continuously enhancing skill coverage.
Practical and Theoretical Implications
Practically, NSI facilitates resource-efficient agentic AI development, requiring few demonstrations and minimal manual engineering to induce robust, reusable skills. Theoretically, by lifting traces to logic and decoupling perception from symbolic execution, NSI advances the paradigm toward autonomous logic discovery. Its inductive bias aligns with compositional generalization objectives and promotes the scaling of persistent agentic abilities. Future developments may further automate symbolic abstraction, integrate deeper RL-style optimization, or extend NSI to multi-agent workflows and safety-critical domains.
Conclusion
NSI introduces a formal framework for persistent skill induction in agentic tasks, synthesizing explicit control flows and variable abstractions from interaction traces. Dynamic skill evolution via reflective planning turns failures into actionable logic improvements, promoting robust generalization and efficient horizon compression. Empirical results underscore NSI's superiority over parameterized scripts and text-based workflow memories, rendering it a powerful mechanism for long-horizon planning and adaptive agentic systems (2605.01293).