Grounding vs. Compositionality: On the Non-Complementarity of Reasoning in Neuro-Symbolic Systems

Published 29 Apr 2026 in cs.AI, cs.CV, cs.LG, and cs.LO | (2604.26521v1)

Abstract: Compositional generalization remains a foundational weakness of modern neural networks, limiting their robustness and applicability in domains requiring out-of-distribution reasoning. A central, yet unverified, assumption in neuro-symbolic AI is that compositional reasoning will emerge as a byproduct of successful symbol grounding. This work presents the first systematic empirical analysis to challenge this assumption by disentangling the contributions of grounding and reasoning. To operationalize this investigation, we introduce the Iterative Logic Tensor Network ($i$LTN), a fully differentiable architecture designed for multi-step deduction. Using a formal taxonomy of generalization -- probing for novel entities, unseen relations, and complex rule compositions -- we demonstrate that a model trained solely on a grounding objective fails to generalize. In contrast, our full $i$LTN, trained jointly on perceptual grounding and multi-step reasoning, achieves high zero-shot accuracy across all tasks. Our findings provide conclusive evidence that symbol grounding, while necessary, is insufficient for generalization, establishing that reasoning is not an emergent property but a distinct capability that requires an explicit learning objective.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper demonstrates that joint training of perceptual grounding and iterative reasoning via iLTN significantly outperforms isolated approaches.
It evaluates compositional generalization along entity, relational, and rule compositions using rigorously controlled synthetic visual logic puzzles.
Results show that robust zero-shot inference requires explicit multi-step reasoning, refuting the assumption of emergent compositionality from grounding alone.

Analysis of “Grounding vs. Compositionality: On the Non-Complementarity of Reasoning in Neuro-Symbolic Systems” (2604.26521)

Introduction and Motivation

This paper addresses a central hypothesis in neuro-symbolic AI: that perceptual symbol grounding is sufficient for compositional generalization. The prevailing intuition in the field has held that, once neural architectures can reliably map complex high-dimensional data (such as pixels) to symbolic representations, the subsequent ability to compose and generalize over unseen symbol combinations will follow as an emergent property of this mapping. This work challenges that assumption by disentangling the contributions of symbol grounding (SG) and compositional reasoning (CR) in a rigorously controlled empirical setting. The authors propose and evaluate the Iterative Logic Tensor Network (iLTN), a differentiable architecture crafted for joint perceptual grounding and iterative, step-wise logical deduction.

Methodology: Disentangling Grounding and Reasoning

The experimental paradigm operationalizes three axes of compositional generalization:

Entity Composition: Generalization to novel symbols under previously seen constraints.
Relational Composition: Adapting reasoning to novel fundamental logical relations not encountered during training.
Rule Composition: Multi-step generalization to novel strategies requiring composition of learned reasoning steps.

The study utilizes synthetic visual logic puzzles, specifically designed to ensure strict out-of-distribution evaluation along these axes. The core architectural contribution, iLTN, extends standard Logic Tensor Networks by introducing an iterative refinement process over a perceptual-to-symbolic embedding. The iLTN performs multi-step deductive inference, with neural modules trained jointly on both grounding and reasoning objectives. The reasoning is implemented through fuzzy-logic-based differentiable saturation of first-order logic constraints, with stepwise refinement via Gumbel-Softmax to preserve gradient flow and annealed temperature scheduling to encourage discrete hypothesis formation.

Results: Empirical Evidence on the Non-Complementarity of Reasoning and Grounding

Entity Composition

On compositional generalization to unseen entities, baseline LTN and iLTN both exhibit near-zero classification accuracy for novel symbols. However, iLTN maintains distinct embedding clusters for unseen digits, enabling logical predicates to enforce constraints directly on the embedding space, supporting successful puzzle resolution even when final class labels are incorrect.

Figure 1: On Entity Composition, both models fail to classify unseen digits, but the iLTN is still able to apply logical constraints to them, outperforming the baseline.

Relational Composition

For adaptation to new relational rules (e.g., arithmetic constraints in KenKen-like puzzles), iLTN demonstrates robust generalization, attaining more than double the accuracy of the baseline.

Figure 2: On Relational Composition, iLTN demonstrates better generalization by adapting to new arithmetic rules of KenKen.

Rule Composition

iLTN’s iterative reasoning loop is essential for generalization to longer deductive chains required by hard puzzles. Performance degrades sharply for the baseline on novel, complex inference chains, whereas iLTN retains strong reasoning capabilities across difficulties.

Figure 3: On Rule Composition, unlike the baseline, iLTN was able to combine and apply more rules to solve hard strategies.

Quantitative Synthesis

Aggregate statistics reinforce the superiority of the joint grounding + reasoning architecture across all composition tasks. iLTN’s zero-shot accuracy is over fourfold higher than the baseline, and its performance is consistent across increased task complexity.

Figure 4: The summary bar visually confirm the iLTN's significant and consistent performance across all three axes of compositional generalization.

Contribution of Joint Grounding and Reasoning

Comparative ablation studies reveal that even a Reasoning-Only iLTN (provided with pre-grounded symbolic input) underperforms the full iLTN trained jointly on both visual mapping and reasoning. This finding demonstrates that the process of learning to reason robustly is facilitated—not hindered—by exposure to the uncertainties in perceptual grounding.

Figure 5: Comparison Performance of Reasoning-Only and Full iLTN

Theoretical and Practical Implications

The results establish that reasoning is not a byproduct of symbol grounding; robust, zero-shot compositional inference in complex domains requires explicit support for both grounding and multi-step reasoning objectives. This evidence refutes the hypothesis of emergent compositionality through grounding alone, instead supporting architectural and algorithmic decoupling of these two challenges in neuro-symbolic design.

Practically, these findings necessitate that future NeSy architectures must be co-designed for perception, representation, and deductive inference, with iterative mechanisms enabling generalization to longer chains and novel compositions. The observed advantage of joint training underscores the importance of training regimes where reasoning modules are exposed to the ambiguous, structured uncertainty present in perceptual embeddings.

Future Directions

This work opens several directions for advanced research:

Scaling and Domain Generality: Assessing the scalability of iterative deduction mechanisms (such as iLTN) in real-world visual domains, multi-modal settings, and large, open-world knowledge graphs.
Neuro-Symbolic Integration: Examining alternative paradigms for more deeply integrating continuous and symbolic representations, possibly with meta-learning or hybrid modular approaches.
Robustness to Perceptual Uncertainty: Explicitly leveraging the regularization effect observed when reasoning modules are co-trained with perceptual networks, including in continual and transfer learning contexts.

Conclusion

This paper provides conclusive empirical evidence that compositional generalization is not an automatic consequence of successful symbol grounding in neuro-symbolic systems. The iterative, jointly trained iLTN architecture vastly outperforms both grounding-only and reasoning-only systems on stringent out-of-distribution compositional tasks. The results compel the field to abandon the assumption of the complementarity of perception and reasoning, instead embracing architectural and training paradigms in which explicit, iterative deduction is a first-class learning objective.

Markdown Report Issue