- The paper introduces a dual-memory framework that separates semantic progress from logical feasibility to address global drift and local action errors.
- It combines neural progress memory for extracting procedural blueprints with symbolic feasibility memory for rule-based verification of actions.
- Empirical results on ALFWorld, WebShop, and TextCraft demonstrate significant improvements over previous baselines, highlighting the framework’s practical robustness.
Neuro-Symbolic Dual Memory for Long-Horizon LLM Agents: A Technical Analysis
Motivation and Dual-Alignment Challenge
Long-horizon LLM agents struggle in complex environments with two principal error modes: (1) global progress drift, where the agent semantically deviates from the main objective, and (2) local feasibility violation, characterized by repeated attempt loops or invalid, non-executable actions. Empirically, conflating these two alignment challenges within a single inherent reasoning paradigm has proven inadequate, as semantic planning and strict logical verification present orthogonal demands.
Figure 1: The neuro-symbolic dual-alignment framework confronts long-horizon failure cycles via separate progress and feasibility alignment, integrating neural semantic guidance and symbolic rule-based verification.
This paper posits that optimal long-horizon behavior requires technically separating global semantic progression (best handled by distributed neural mechanisms) from local logical executability (best addressed via explicit symbolic rules). Neural approaches excel at extracting stage-wise patterns from successful trajectories, while symbolic mechanisms provide transparent, verifiable filtering of failed action attempts.
Neuro-Symbolic Dual Memory Framework
The proposed framework is architected around two principal memories (Figure 2):
- Neural Progress Memory: Structurally encodes semantic procedural blueprints distilled from successful trajectories. By aligning future actions with stage-anchored exemplars, it ensures agents consistently advance along canonical, high-level subgoals.
- Symbolic Feasibility Memory: Extracts executable verification rules from negative (failed) transitions during exploration. These rules are compiled as high-precision Python functions that intercept and correct infeasible actions before execution, strictly anchoring local behaviors within validated boundaries.
Rather than combining both reasoning streams within a single decision process, the framework executes an explicit dual-alignment inference loop: at each step, the neural pathway proposes a progress-consistent candidate action, subsequently refined and/or vetoed by the symbolic verifier, guaranteeing joint global and local alignment.
Figure 2: Offline distillation phase extracts symbolic feasibility rules from failed transitions and semantic blueprints from successful trajectories; at inference, neural and symbolic memories jointly drive dual-aligned action selection and verification.
Technical Methodology
Memory Construction
The core pipeline begins with collecting online trajectories on a disjoint training set. Successful episodes are processed by a distillation agent, which parses the action sequence into a sequence of semantic anchors (blueprints) and aligns each anchor with the corresponding action sub-chunks. These blueprints and their embeddings populate the neural progress memory.
Failed interactions, on the other hand, generate explicit negative transitions. An induction agent synthesizes symbolic representations of the environment-visible state and actions for each failed attempt. It induces (and greedily prunes for coverage and precision) symbolic validation rules, retaining only those with zero false positives on positive transitions.
Inference Dynamics
At test time, given a new task, the agent retrieves relevant semantic blueprints from the neural memory and generates a procedural plan for the overall episode, dynamically selecting current progress anchors. For each step:
- The actor, informed by the active blueprint and demonstration chunk retrievals, proposes a progress-consistent action candidate;
- The feasible memory verifies executability, possibly triggering iterative refinement until a valid, non-rejected action is found;
- The agent executes the action, the progress monitor evaluates stage completion, and anchor progression occurs upon semantic task advancement.
The system structurally decouples progress guidance (flexible, semantic, context-driven) from feasibility checking (binary, symbolically grounded), yielding robust dual alignment.
Experimental Results
Benchmarks span distinctly challenging agents: ALFWorld (embodied manipulation), WebShop (web navigation and selection), and TextCraft (compositional crafting with recursive dependencies).
Performance Summary:
| Method |
ALFWorld SR (%) |
WebShop SR (%) |
WebShop Score |
TextCraft SR (%) |
| Ours |
94.78 |
51 |
0.7132 |
94 |
| Best Prior |
88.81 (AWM) |
35 (Reflexion) |
0.5998 (WALL-E 2.0) |
88 (ExpeL) |
The dual-memory agent achieves the highest success rates and lowest invalid action rates on all three environments, improving over the strongest domain-adapted baselines, underscoring the necessity of explicit dual-alignment over monolithic or uni-paradigmatic approaches.
Ablation studies establish the complementarity of the two memories:
- Removal of feasibility memory drastically increases invalid action rate, directly confirming the symbolic filter's critical role in suppressing repetitive or infeasible agent behavior.
- Removal of progress memory produces longer, less efficient trajectories and reduced overall task success—highlighting the inefficiency of purely local, feasibility-driven policies absent global progression anchors.
Further breakdowns confirm that the structured stage-wise organization in progress memory, especially with anchor-level retrieval, drives most gains, as opposed to naïve retrieval augmentation. Among feasibility mechanisms, only executable rule-based verifiers adequately balance strict local correction with overall task throughput; tighter prompt-level constraints reduce valid action space and undermine long-horizon task success when not tightly coupled with symbolic execution.
Theoretical and Practical Implications
The neuro-symbolic dual memory framework decisively demonstrates that LLM-based agents for long-horizon tasks cannot be reliably grounded using a single semantic or purely symbolic knowledge representation. The dual-pathway design directly attacks the brittle failure modes of global drift and local infeasibility by matching the reasoning substrate (neural or symbolic) to the objective (progress or feasibility).
Practically, this design can be immediately deployed in any setting where agents operate across loosely structured, high-dimensional, but tightly constrained environments—covering real-world embodied robotics, web-based instruction following, or even complex workflow automation. The framework's generality depends only on access to agent-visible trajectories for offline (self-supervised) induction.
Future Directions
A principal limitation remains the requirement for sufficient offline experience—especially failed transitions with strict rejection signals—needed to induce a high-precision symbolic verifier. Extending the framework to sparse-reward or non-transparent environments is non-trivial and may require self-supervised surrogate objectives, richer simulator probing, or integration with program synthesis for generalized verifier construction.
Another direction involves meta-learning or continual updating of both memories as the agent encounters novel states, actions, or objectives, potentially leveraging large-scale self-improvement or transfer across families of tasks. Integrating formal verification techniques with neural blueprint distillation for provably correct long-horizon behavior is also an open research avenue.
Conclusion
By formalizing and operationalizing the decoupling of progress and feasibility alignment, this work establishes a new technical foundation for robust LLM-based long-horizon agent design. The neuro-symbolic dual memory architecture empirically resolves the critical trade-off between semantic generalization and logical executability, yielding both high quantitative performance and architectural interpretability, with immediate applicability and extensibility across diverse agent domains (2604.02734).