Shattered Compositionality in AI

Updated 6 February 2026

Shattered compositionality is the breakdown of systematic, rule-governed composition in systems, revealing limits in AI and formal models.
Empirical studies show that standard neural models and transformers struggle to generalize with non-compositional representations under minor distribution shifts.
Remediation strategies include architectural innovations and symbolic scaffolding to enforce structure-sensitive operations and restore compositionality.

Shattered compositionality refers to the breakdown or failure of systematic, rule-governed composition in systems—artificial or mathematical—that purportedly should build complex phenomena from simpler parts in a predictable manner. This phenomenon arises across diverse domains, including deep learning, cognitive modeling, mathematical logic, categorical semantics, and algebraic topology, signifying a critical challenge for both practical AI systems and theoretical frameworks aiming to formalize compositionality.

1. Formalizing Compositionality and Its Breakdown

Compositionality, in its classical sense, asserts that the meaning or function of a complex entity (sentence, program, morphism) is entirely determined by its parts and the rules or operations by which they are combined. For natural language, this corresponds to productivity (a finite set of primitives yielding infinite new expressions) and systematicity (the meaning of compositions is a function of its parts according to their syntactic combination) (Auersperger et al., 2022, Woydt et al., 2 Jun 2025).

Shattered compositionality emerges when this principle is violated—typically when:

Surface generalization occurs without rule-based, structure-sensitive composition; i.e., models perform well in distribution but fail badly under minor distribution shifts.
Non-isomorphic functors, non-cartesian monoidal structures, or similar mathematical entities obstruct faithful composition in category theory and logic (Puca et al., 2023, Chemla et al., 2017).
Semantic integration over parts is context-sensitive to such a degree that no joint, context-free operation suffices; this is made precise via factorization failure in probabilistic models (Bruza et al., 2013).

2. Empirical Manifestations in Neural and Cognitive Systems

Neural Communication and Emergent Languages

Auersperger and Pecina (Auersperger et al., 2022) demonstrated that neural agents in two-agent communication games can generalize in-distribution with arbitrary, non-compositional codes. However, when required to generalize out-of-distribution (OOD)—for example, to unseen combinations of primitive meanings—only languages with internal compositional structure leap the generalization gap. Metrics such as positional disentanglement (posdis), bag-of-symbol disentanglement (bosdis), and topographic similarity (topsim) show a robust correlation with OOD success. Non-compositional codes "shatter" under OOD tests, showing catastrophic generalization gaps.

Meta-learning and Systematicity

Woydt et al. (Woydt et al., 2 Jun 2025) explored the meta-learning regime. They found even sophisticated transformer-based meta-learners fall short of human-like systematic compositionality when tasked with novel rule compositions absent in training. Accurate generalization is replaced by erratic, non-systematic outputs—mixing up rules or failing to parse ambiguous structures—when exposed to grammars outside the training manifold. This shattering is not due to capacity or scale but the absence of genuinely compositional representations and structure-sensitive operations.

Transformer Learning Dynamics

Modern transformers, including both base and large LLMs, exhibit non-human learning dynamics when mastering compositional skills, such as multi-digit arithmetic (Zhao et al., 30 Jan 2026). Instead of acquiring subskills (e.g., digit addition) in a human-sequential order (units→tens→hundreds), transformers typically learn higher-order subskills first, then acquire lower-order subskills in a reversed or parallel manner. This order-agnostic correlational matching leads to mixing errors, especially under distributional shift, such as longer digit sequences or permuted orders—a hallmark of shattered compositionality.

3. Diagnostic and Theoretical Tools for Quantifying Shattering

Several computational and mathematical diagnostics have been proposed.

Domain	Diagnostic Tool / Metric	What It Detects
Neural Models	Positional/topographic disentanglement, probes	Attribution of message parts/meaning parts (LMs, emergent codes) (Auersperger et al., 2022, Liu et al., 2022)
Logic	Failure of truth-functionality after Scott–Suszko reduction	Inability to recover uniform truth-tables for connectives (Chemla et al., 2017)
Probability	Marginal selectivity & CHSH/Bell inequalities	Contextuality/nonexistence of global joint distributions (Bruza et al., 2013)
Category Theory	Zeroth/first homotopy posets, obstruction theory	Detects where (op)laxators are non-invertible, i.e., functor fails to be strong (Puca et al., 2023)
Algebraic Topology (Sheaves)	Nontrivial Čech cohomology	Gluing of local data fails; lavish/separated presheaf distinction (Bumpus et al., 2024)

These diagnostics make it possible to:

Pinpoint the locus and degree of compositionality failure
Disentangle benign variation from true compositional fracture
Quantitatively relate empirical generalization gaps to theoretical obstructions

4. Case Studies and Domain-Specific Phenomena

Logic and Proof Theory

Suszko's problem and the associated Scott–Suszko reduction reveals that semantics for even monotonic or Tarskian logics can always be reduced to a small number of "logical" values, but the process generally shatters truth-functionality: compositionality is lost, with connectives forced to depend on context (world label) in their reduced tables (Chemla et al., 2017). Only for regular (Gentzen-structured) connectives and compact logics can truth-functional semantics be reinstated, but outside this regime, the very notion of compositionality dissolves.

Sheaf and Cohomology Approaches

In topological and categorical settings, the gluing of local data to global sections (sheaf condition) is the mathematical archetype of compositionality. Failures of unique gluing—measured as nontrivial Čech cohomology—quantify how compositionality "shatters" (Bumpus et al., 2024). Lavish presheaves satisfy the existence (but not uniqueness) part of the sheaf condition, and their zeroth cohomology captures the distance to full compositionality. Notably, even when $H^0$ is nonzero (i.e., compositionality fails), this "obstruction" can itself often be assembled compositionally, allowing for efficient algorithmic schemes.

Vision-Language and Probabilistic Semantics

In CLIP-style vision-LLMs, shattered compositionality is explained by the existence of pseudo-optimal encoders that perfectly align image–text pairs according to the contrastive objective yet are insensitive to token-level recombinations (SWAP, REPLACE, ADD), reflecting composition nonidentifiability (Chen et al., 30 Oct 2025). Similar results hold in probabilistic concept combination: failure of marginal selectivity or violation of Bell/CHSH inequalities demonstrates that the joint distribution required for compositional semantics simply does not exist, and semantic combinatorics becomes contextually fractured (Bruza et al., 2013).

5. Remediation Strategies and Theoretical Implications

Restoring compositionality requires explicit inductive biases or algorithmic constraints:

Architectural innovations: Adding disentangled, discrete primitive embeddings, multi-step compositional modules, or explicit structure-sensitive routing has been shown to mitigate shattering in transformers and LLMs (Huang et al., 2023).
Improved evaluation protocols: OOD splits, adversarial examples (swaps, hard negatives), and systematic compositionality probes are necessary to expose the phenomenon and guide remediation (Auersperger et al., 2022, Liu et al., 2022).
Symbolic scaffolding: Hybrid systems that encode symbolic or programmatic inductive biases alongside neural networks offer improved robustness under compositional stressors (Woydt et al., 2 Jun 2025).
Cohomological repair: In algebraic-topological settings, cohomology identifies exactly the degree of failure and, in algorithmic tasks, enables the design of dynamic programming schemes that are compositional even when the original gluing property fails (Bumpus et al., 2024).

However, these remediation strategies are often domain-specific and rely on carefully engineering the architecture, loss, or data splits, rather than emerging naturally from generic learning.

6. Theoretical Limits and Open Problems

Despite successes in tailored settings, shattered compositionality remains pervasive:

Scaling alone is insufficient: Increased model scale improves compositional generalization within distribution but does not fundamentally solve the shattering under distributional or structural shift (Dhar et al., 2024).
Instruction tuning can degrade latent compositionality: Alignment objectives may override or reorganize the emergent compositional geometry acquired during LLM pretraining, "shattering" previously functional strategies despite raw performance gains (Dhar et al., 2024).
Mathematical boundaries: In logic and category theory, full restoration of compositionality often requires restrictive syntactic or semantic assumptions (regular connectives, compactness), and remains impossible in some logical settings (Chemla et al., 2017, Puca et al., 2023).
Universal diagnostics: There is no one-size-fits-all test for shattered compositionality. The phenomenology is highly sensitive to the domain and the underlying structure (e.g., local vs. global compositionality in language (Dankers et al., 2021)).

7. Broader Implications and Future Directions

Shattered compositionality exposes the fragility of generalization, interpretability, and robustness in artificial systems and the subtlety of compositional principles in mathematical logic and semantics. For future research, several trajectories emerge:

Development of system- and architecture-level inductive biases that enforce or encourage systematic, structure-sensitive composition.
Extension of categorical and cohomological tools to dynamic, data-driven contexts, bridging algebraic obstructions and empirical behavior.
Comprehensive evaluation protocols that jointly test local, global, and OOD compositionality across modalities and tasks.
Integration of symbolic and continuous representations to reconcile flexibility with guaranteeable rule-based structure.

The measured classification and remediation of failures—whether via homotopy posets, cohomology, mutual information probes, or architectural modules—will continue to be central in the pursuit of models and theories that achieve robust, human-like compositional generalization. Shattered compositionality constitutes not the demise of compositional principles, but a lens to examine, quantify, and ultimately rectify failures at the interface of learning, logic, and formal semantics.