- The paper introduces AGEL-Comp, a framework that integrates neural and symbolic methods to achieve perfect quest success and a 60% boost in first-try success rate.
- It employs a hybrid architecture with a dynamic Causal Program Graph, ILP for rule induction, and Neural Theorem Proving for rigorous plan verification.
- Experimental results in a 2D RPG environment demonstrate superior sample efficiency, interpretability, and robust generalization compared to baseline agents.
AGEL-Comp: A Neuro-Symbolic Framework for Compositional Generalization in Interactive Agents
Motivation and Background
Current LLM-based agents are limited by systemic failures in compositional generalization, preventing robust, adaptive behavior in interactive environments. The lack of grounded, structured, and interpretable world models leads to brittle performance and constrains generalization across novel scenarios. AGEL-Comp directly addresses these deficits by tightly coupling neural and symbolic paradigms, leveraging a hybrid architecture that synthesizes dynamic program induction, formal verification, and neural representation learning.
LLMs have demonstrated broad success in natural language–centric tasks, but suffer acutely from what the authors term the "compositionality crisis." This is reflected in a persistent failure to extrapolate correct behavior in previously unseen yet structurally simple combinations of known primitives. Prior work on causal graphical models, ILP, and differentiable theorem proving substantiates the need for explicit, structured knowledge representations and rule-based reasoning mechanisms for robust agent cognition.
AGEL-Comp Architecture
AGEL-Comp consists of three principal interconnected modules: a dynamic Causal Program Graph (CPG) serving as the explicit, executable world model; an Inductive Logic Programming (ILP) engine for online, experientially grounded rule induction; and a hybrid reasoning core wherein an LLM proposes candidate plans which are formally verified by a Neural Theorem Prover (NTP).
Figure 1: The AGEL-Comp neuro-symbolic architecture, integrating perception, LLM core, world model, ILP, NTP verifier, and action/feedback modules.
The agent's perception module encodes environmental state as structured ground literals. The LLM core, conditioned on current percepts and goals, proposes plans or sub-goals which are then passed through an NTP verifier against the current state of the CPG. Only logically consistent plans are executed in the environment. Each episode is logged in an episodic memory and used for experiential grounding.
The grounding function, central to knowledge update and repair, operates via a two-stage process: (1) minimal contrastive causal attribution for fine-grained credit assignment, and (2) meta-interpretive ILP induction for rule generalization and abstraction. Verified rules are integrated into the CPG, and the neural symbol embedding matrix is continually fine-tuned alongside the NTP as new rules are accrued.
Learning Cycle and Operational Loop
AGEL-Comp implements a rigorous deduction–abduction learning cycle. Action plans are generated (LLM), verified (NTP), and executed, with unexpected outcomes triggering credit assignment, causal hypothesis extraction, and symbolic rule induction (ILP). The resultant Horn clauses update the world model, providing a mechanism for generalization from specific interactions. Neural representation and logical inference align via shared, trainable embeddings and continual fine-tuning of the NTP for closed-loop adjustment to the agent's growing structured knowledge.
Experimental Protocol and Results
The framework is evaluated in the "Retro Quest" environment, a procedurally rich, interactive 2D RPG platform specifically constructed to probe compositional generalization. The experimental design includes challenging ambiguous and stochastic events, with a multi-quest curriculum and rigorous ablation studies. Four multimodal LLMs are used as backbone cores (GPT-4o, Gemini Pro 2.5, DeepSeek v1, LLaVA 1.6).
Core Evaluation Metrics
Performance is measured via Quest Success Rate, First-Try Success Rate (probing sample-efficient, zero-shot generalization), total iterations, adaptation trials, and the number of rules learned.
Main Findings
- Quest Success: AGEL-Comp achieves perfect (100%) aggregate success across all backbone LLMs, while baseline LLM-only agents degrade significantly (down to 63.3% for LLaVA 1.6) and fail catastrophically on the hardest quests.
Figure 2: Aggregated Quest Success and First-Try Success Rate by agent configuration demonstrating superior AGEL-Comp performance.
Figure 3: Per-LLM breakdown of Quest Success and First-Try Success, highlighting backbone-agnostic robustness for AGEL-Comp.
- First-Try Success Rate: AGEL-Comp achieves a substantial improvement—up to 60%—far exceeding the baseline, which achieves as low as 0–6.7%.
- Efficiency: AGEL-Comp demonstrates markedly improved sample efficiency (mean 23–41 interactions to successful adaptation) compared to the 140–250+ samples required by LLM-only agents.
Figure 4: Sample and iteration efficiency (per quest and per agent configuration) with AGEL-Comp.
- Ablation Studies: Removing either the NTP verifier or the ILP learner resulted in a collapse of generalization. Without NTP, the agent is forced into risky trial-and-error; without ILP, it cannot repair its knowledge base and accumulates systematic errors, especially on out-of-distribution goals.
- Compositional Robustness: On hardest difficulty tiers, AGEL-Comp remains stable, whereas baselines' performance plummets.
Figure 5: Catastrophic degradation for baselines and ablations on hard quests; AGEL-Comp remains robust.
- Interpretability and World Model Growth: The architecture facilitates direct visualization of the evolution of symbolic causal knowledge over time, enabling intelligibility and formal debugging.
Figure 6: Online development of the causal program graph structure through grounded interactive experience.
Architectural Implications and Theoretical Significance
This work provides strong empirical and architectural support for the thesis that integrated neuro-symbolic systems are necessary (not merely beneficial) for robust, compositional, and generalizable cognitive agents. The explicit separation of plan proposal (LLM) from plan verification (NTP) ensures logical discipline, while online rule induction ensures the continual expansion and repair of symbolic models based on experience, facilitating adaptive generalization in non-stationary domains.
The deductive–abductive cycle in AGEL-Comp directly targets the compositional failings of LLMs, binding together creative proposal and rigorous grounding. The framework also illustrates a scalable methodology for integrating symbolic program induction with neural reasoning in a continual, interactive context.
Practical Considerations and Future Directions
AGEL-Comp advances the deployment potential of embodied neuro-symbolic agents capable of generalizing beyond encountered distributions. Practical deployment in real-world settings will necessitate scalable management of verification and induction costs, robustness to perceptual noise for symbol grounding, automated rule pruning, and dynamic model repair under environment shift. The architecture opens new avenues for research in causal symbolic knowledge, sim-to-real transfer, and interpretable agent design.
Conclusion
AGEL-Comp constitutes a principled neuro-symbolic agent architecture that fuses the flexibility of LLM-based generative planning with the formal rigor of logical verification and induction. The architecture's synergistic learning cycle delivers strong compositional generalization, perfect task success, and robust adaptation under distribution shift, outperforming baseline LLM-based agents and ablated variants. This work establishes a technical foundation for future interactive agents that require explicit, grounded, and interpretable world models to reliably generalize in complex domains.