Papers
Topics
Authors
Recent
Search
2000 character limit reached

Aligning Progress and Feasibility: A Neuro-Symbolic Dual Memory Framework for Long-Horizon LLM Agents

Published 3 Apr 2026 in cs.AI | (2604.02734v1)

Abstract: LLMs have demonstrated strong potential in long-horizon decision-making tasks, such as embodied manipulation and web interaction. However, agents frequently struggle with endless trial-and-error loops or deviate from the main objective in complex environments. We attribute these failures to two fundamental errors: global Progress Drift and local Feasibility Violation. Existing methods typically attempt to address both issues simultaneously using a single paradigm. However, these two challenges are fundamentally distinct: the former relies on fuzzy semantic planning, while the latter demands strict logical constraints and state validation. The inherent limitations of such a single-paradigm approach pose a fundamental challenge for existing models in handling long-horizon tasks. Motivated by this insight, we propose a Neuro-Symbolic Dual Memory Framework that explicitly decouples semantic progress guidance from logical feasibility verification. Specifically, during the inference phase, the framework invokes both memory mechanisms synchronously: on one hand, a neural-network-based Progress Memory extracts semantic blueprints from successful trajectories to guide global task advancement; on the other hand, a symbolic-logic-based Feasibility Memory utilizes executable Python verification functions synthesized from failed transitions to perform strict logical validation. Experiments demonstrate that this method significantly outperforms existing competitive baselines on ALFWorld, WebShop, and TextCraft, while drastically reducing the invalid action rate and average trajectory length.

Summary

  • The paper introduces a dual-memory framework that separates semantic progress from logical feasibility to address global drift and local action errors.
  • It combines neural progress memory for extracting procedural blueprints with symbolic feasibility memory for rule-based verification of actions.
  • Empirical results on ALFWorld, WebShop, and TextCraft demonstrate significant improvements over previous baselines, highlighting the framework’s practical robustness.

Neuro-Symbolic Dual Memory for Long-Horizon LLM Agents: A Technical Analysis

Motivation and Dual-Alignment Challenge

Long-horizon LLM agents struggle in complex environments with two principal error modes: (1) global progress drift, where the agent semantically deviates from the main objective, and (2) local feasibility violation, characterized by repeated attempt loops or invalid, non-executable actions. Empirically, conflating these two alignment challenges within a single inherent reasoning paradigm has proven inadequate, as semantic planning and strict logical verification present orthogonal demands. Figure 1

Figure 1: The neuro-symbolic dual-alignment framework confronts long-horizon failure cycles via separate progress and feasibility alignment, integrating neural semantic guidance and symbolic rule-based verification.

This paper posits that optimal long-horizon behavior requires technically separating global semantic progression (best handled by distributed neural mechanisms) from local logical executability (best addressed via explicit symbolic rules). Neural approaches excel at extracting stage-wise patterns from successful trajectories, while symbolic mechanisms provide transparent, verifiable filtering of failed action attempts.

Neuro-Symbolic Dual Memory Framework

The proposed framework is architected around two principal memories (Figure 2):

  • Neural Progress Memory: Structurally encodes semantic procedural blueprints distilled from successful trajectories. By aligning future actions with stage-anchored exemplars, it ensures agents consistently advance along canonical, high-level subgoals.
  • Symbolic Feasibility Memory: Extracts executable verification rules from negative (failed) transitions during exploration. These rules are compiled as high-precision Python functions that intercept and correct infeasible actions before execution, strictly anchoring local behaviors within validated boundaries.

Rather than combining both reasoning streams within a single decision process, the framework executes an explicit dual-alignment inference loop: at each step, the neural pathway proposes a progress-consistent candidate action, subsequently refined and/or vetoed by the symbolic verifier, guaranteeing joint global and local alignment. Figure 2

Figure 2: Offline distillation phase extracts symbolic feasibility rules from failed transitions and semantic blueprints from successful trajectories; at inference, neural and symbolic memories jointly drive dual-aligned action selection and verification.

Technical Methodology

Memory Construction

The core pipeline begins with collecting online trajectories on a disjoint training set. Successful episodes are processed by a distillation agent, which parses the action sequence into a sequence of semantic anchors (blueprints) and aligns each anchor with the corresponding action sub-chunks. These blueprints and their embeddings populate the neural progress memory.

Failed interactions, on the other hand, generate explicit negative transitions. An induction agent synthesizes symbolic representations of the environment-visible state and actions for each failed attempt. It induces (and greedily prunes for coverage and precision) symbolic validation rules, retaining only those with zero false positives on positive transitions.

Inference Dynamics

At test time, given a new task, the agent retrieves relevant semantic blueprints from the neural memory and generates a procedural plan for the overall episode, dynamically selecting current progress anchors. For each step:

  1. The actor, informed by the active blueprint and demonstration chunk retrievals, proposes a progress-consistent action candidate;
  2. The feasible memory verifies executability, possibly triggering iterative refinement until a valid, non-rejected action is found;
  3. The agent executes the action, the progress monitor evaluates stage completion, and anchor progression occurs upon semantic task advancement.

The system structurally decouples progress guidance (flexible, semantic, context-driven) from feasibility checking (binary, symbolically grounded), yielding robust dual alignment.

Experimental Results

Benchmarks span distinctly challenging agents: ALFWorld (embodied manipulation), WebShop (web navigation and selection), and TextCraft (compositional crafting with recursive dependencies).

Performance Summary:

Method ALFWorld SR (%) WebShop SR (%) WebShop Score TextCraft SR (%)
Ours 94.78 51 0.7132 94
Best Prior 88.81 (AWM) 35 (Reflexion) 0.5998 (WALL-E 2.0) 88 (ExpeL)

The dual-memory agent achieves the highest success rates and lowest invalid action rates on all three environments, improving over the strongest domain-adapted baselines, underscoring the necessity of explicit dual-alignment over monolithic or uni-paradigmatic approaches.

Ablation studies establish the complementarity of the two memories:

  • Removal of feasibility memory drastically increases invalid action rate, directly confirming the symbolic filter's critical role in suppressing repetitive or infeasible agent behavior.
  • Removal of progress memory produces longer, less efficient trajectories and reduced overall task success—highlighting the inefficiency of purely local, feasibility-driven policies absent global progression anchors.

Further breakdowns confirm that the structured stage-wise organization in progress memory, especially with anchor-level retrieval, drives most gains, as opposed to naïve retrieval augmentation. Among feasibility mechanisms, only executable rule-based verifiers adequately balance strict local correction with overall task throughput; tighter prompt-level constraints reduce valid action space and undermine long-horizon task success when not tightly coupled with symbolic execution.

Theoretical and Practical Implications

The neuro-symbolic dual memory framework decisively demonstrates that LLM-based agents for long-horizon tasks cannot be reliably grounded using a single semantic or purely symbolic knowledge representation. The dual-pathway design directly attacks the brittle failure modes of global drift and local infeasibility by matching the reasoning substrate (neural or symbolic) to the objective (progress or feasibility).

Practically, this design can be immediately deployed in any setting where agents operate across loosely structured, high-dimensional, but tightly constrained environments—covering real-world embodied robotics, web-based instruction following, or even complex workflow automation. The framework's generality depends only on access to agent-visible trajectories for offline (self-supervised) induction.

Future Directions

A principal limitation remains the requirement for sufficient offline experience—especially failed transitions with strict rejection signals—needed to induce a high-precision symbolic verifier. Extending the framework to sparse-reward or non-transparent environments is non-trivial and may require self-supervised surrogate objectives, richer simulator probing, or integration with program synthesis for generalized verifier construction.

Another direction involves meta-learning or continual updating of both memories as the agent encounters novel states, actions, or objectives, potentially leveraging large-scale self-improvement or transfer across families of tasks. Integrating formal verification techniques with neural blueprint distillation for provably correct long-horizon behavior is also an open research avenue.

Conclusion

By formalizing and operationalizing the decoupling of progress and feasibility alignment, this work establishes a new technical foundation for robust LLM-based long-horizon agent design. The neuro-symbolic dual memory architecture empirically resolves the critical trade-off between semantic generalization and logical executability, yielding both high quantitative performance and architectural interpretability, with immediate applicability and extensibility across diverse agent domains (2604.02734).

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.