Papers
Topics
Authors
Recent
Search
2000 character limit reached

AGEL-Comp: A Neuro-Symbolic Framework for Compositional Generalization in Interactive Agents

Published 29 Apr 2026 in cs.AI, cs.LG, cs.LO, cs.MA, and cs.SC | (2604.26522v1)

Abstract: LLM-based agents exhibit systemic failures in compositional generalization, limiting their robustness in interactive environments. This work introduces AGEL-Comp, a neuro-symbolic AI agent architecture designed to address this challenge by grounding actions of the agent. AGEL-Comp integrates three core innovations: (1) a dynamic Causal Program Graph (CPG) as a world model, representing procedural and causal knowledge as a directed hypergraph; (2) an Inductive Logic Programming (ILP) engine that synthesizes new Horn clauses from experiential feedback, grounding symbolic knowledge through interaction; and (3) a hybrid reasoning core where an LLM proposes a set of candidate sub-goals that are verified for logical consistency by a Neural Theorem Prover (NTP). Together, these components operationalize a deduction--abduction learning cycle: enabling the agent to deduce plans and abductively expand its symbolic world model, while a neural adaptation phase keeps its reasoning engine aligned with new knowledge. We propose an evaluation protocol within the \texttt{Retro Quest} simulation environment to probe for compositional generalization scenarios to evaluate our AGEL agent. Our findings clearly indicate the better performance of our AGEL model over pure LLM-based models. Our framework presents a principled path toward agents that build an explicit, interpretable, and compositionally structured understanding of their world.

Authors (2)

Summary

  • The paper introduces AGEL-Comp, a framework that integrates neural and symbolic methods to achieve perfect quest success and a 60% boost in first-try success rate.
  • It employs a hybrid architecture with a dynamic Causal Program Graph, ILP for rule induction, and Neural Theorem Proving for rigorous plan verification.
  • Experimental results in a 2D RPG environment demonstrate superior sample efficiency, interpretability, and robust generalization compared to baseline agents.

AGEL-Comp: A Neuro-Symbolic Framework for Compositional Generalization in Interactive Agents

Motivation and Background

Current LLM-based agents are limited by systemic failures in compositional generalization, preventing robust, adaptive behavior in interactive environments. The lack of grounded, structured, and interpretable world models leads to brittle performance and constrains generalization across novel scenarios. AGEL-Comp directly addresses these deficits by tightly coupling neural and symbolic paradigms, leveraging a hybrid architecture that synthesizes dynamic program induction, formal verification, and neural representation learning.

LLMs have demonstrated broad success in natural language–centric tasks, but suffer acutely from what the authors term the "compositionality crisis." This is reflected in a persistent failure to extrapolate correct behavior in previously unseen yet structurally simple combinations of known primitives. Prior work on causal graphical models, ILP, and differentiable theorem proving substantiates the need for explicit, structured knowledge representations and rule-based reasoning mechanisms for robust agent cognition.

AGEL-Comp Architecture

AGEL-Comp consists of three principal interconnected modules: a dynamic Causal Program Graph (CPG) serving as the explicit, executable world model; an Inductive Logic Programming (ILP) engine for online, experientially grounded rule induction; and a hybrid reasoning core wherein an LLM proposes candidate plans which are formally verified by a Neural Theorem Prover (NTP). Figure 1

Figure 1: The AGEL-Comp neuro-symbolic architecture, integrating perception, LLM core, world model, ILP, NTP verifier, and action/feedback modules.

The agent's perception module encodes environmental state as structured ground literals. The LLM core, conditioned on current percepts and goals, proposes plans or sub-goals which are then passed through an NTP verifier against the current state of the CPG. Only logically consistent plans are executed in the environment. Each episode is logged in an episodic memory and used for experiential grounding.

The grounding function, central to knowledge update and repair, operates via a two-stage process: (1) minimal contrastive causal attribution for fine-grained credit assignment, and (2) meta-interpretive ILP induction for rule generalization and abstraction. Verified rules are integrated into the CPG, and the neural symbol embedding matrix is continually fine-tuned alongside the NTP as new rules are accrued.

Learning Cycle and Operational Loop

AGEL-Comp implements a rigorous deduction–abduction learning cycle. Action plans are generated (LLM), verified (NTP), and executed, with unexpected outcomes triggering credit assignment, causal hypothesis extraction, and symbolic rule induction (ILP). The resultant Horn clauses update the world model, providing a mechanism for generalization from specific interactions. Neural representation and logical inference align via shared, trainable embeddings and continual fine-tuning of the NTP for closed-loop adjustment to the agent's growing structured knowledge.

Experimental Protocol and Results

The framework is evaluated in the "Retro Quest" environment, a procedurally rich, interactive 2D RPG platform specifically constructed to probe compositional generalization. The experimental design includes challenging ambiguous and stochastic events, with a multi-quest curriculum and rigorous ablation studies. Four multimodal LLMs are used as backbone cores (GPT-4o, Gemini Pro 2.5, DeepSeek v1, LLaVA 1.6).

Core Evaluation Metrics

Performance is measured via Quest Success Rate, First-Try Success Rate (probing sample-efficient, zero-shot generalization), total iterations, adaptation trials, and the number of rules learned.

Main Findings

  • Quest Success: AGEL-Comp achieves perfect (100%) aggregate success across all backbone LLMs, while baseline LLM-only agents degrade significantly (down to 63.3% for LLaVA 1.6) and fail catastrophically on the hardest quests. Figure 2

    Figure 2: Aggregated Quest Success and First-Try Success Rate by agent configuration demonstrating superior AGEL-Comp performance.

    Figure 3

    Figure 3: Per-LLM breakdown of Quest Success and First-Try Success, highlighting backbone-agnostic robustness for AGEL-Comp.

  • First-Try Success Rate: AGEL-Comp achieves a substantial improvement—up to 60%—far exceeding the baseline, which achieves as low as 0–6.7%.
  • Efficiency: AGEL-Comp demonstrates markedly improved sample efficiency (mean 23–41 interactions to successful adaptation) compared to the 140–250+ samples required by LLM-only agents. Figure 4

    Figure 4: Sample and iteration efficiency (per quest and per agent configuration) with AGEL-Comp.

  • Ablation Studies: Removing either the NTP verifier or the ILP learner resulted in a collapse of generalization. Without NTP, the agent is forced into risky trial-and-error; without ILP, it cannot repair its knowledge base and accumulates systematic errors, especially on out-of-distribution goals.
  • Compositional Robustness: On hardest difficulty tiers, AGEL-Comp remains stable, whereas baselines' performance plummets. Figure 5

    Figure 5: Catastrophic degradation for baselines and ablations on hard quests; AGEL-Comp remains robust.

  • Interpretability and World Model Growth: The architecture facilitates direct visualization of the evolution of symbolic causal knowledge over time, enabling intelligibility and formal debugging. Figure 6

    Figure 6: Online development of the causal program graph structure through grounded interactive experience.

Architectural Implications and Theoretical Significance

This work provides strong empirical and architectural support for the thesis that integrated neuro-symbolic systems are necessary (not merely beneficial) for robust, compositional, and generalizable cognitive agents. The explicit separation of plan proposal (LLM) from plan verification (NTP) ensures logical discipline, while online rule induction ensures the continual expansion and repair of symbolic models based on experience, facilitating adaptive generalization in non-stationary domains.

The deductive–abductive cycle in AGEL-Comp directly targets the compositional failings of LLMs, binding together creative proposal and rigorous grounding. The framework also illustrates a scalable methodology for integrating symbolic program induction with neural reasoning in a continual, interactive context.

Practical Considerations and Future Directions

AGEL-Comp advances the deployment potential of embodied neuro-symbolic agents capable of generalizing beyond encountered distributions. Practical deployment in real-world settings will necessitate scalable management of verification and induction costs, robustness to perceptual noise for symbol grounding, automated rule pruning, and dynamic model repair under environment shift. The architecture opens new avenues for research in causal symbolic knowledge, sim-to-real transfer, and interpretable agent design.

Conclusion

AGEL-Comp constitutes a principled neuro-symbolic agent architecture that fuses the flexibility of LLM-based generative planning with the formal rigor of logical verification and induction. The architecture's synergistic learning cycle delivers strong compositional generalization, perfect task success, and robust adaptation under distribution shift, outperforming baseline LLM-based agents and ablated variants. This work establishes a technical foundation for future interactive agents that require explicit, grounded, and interpretable world models to reliably generalize in complex domains.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.