When does compositional structure yield compositional generalization? A kernel theory (2405.16391v2)

Published 26 May 2024 in cs.LG and q-bio.NC

Abstract: Compositional generalization (the ability to respond correctly to novel combinations of familiar components) is thought to be a cornerstone of intelligent behavior. Compositionally structured (e.g. disentangled) representations are essential for this; however, the conditions under which they yield compositional generalization remain unclear. To address this gap, we present a general theory of compositional generalization in kernel models with fixed representations, a tractable framework for characterizing the impact of dataset statistics on generalization. We find that kernel models are constrained to adding up values assigned to each combination of components seen during training ("conjunction-wise additivity"). This imposes fundamental restrictions on the set of tasks these models can learn, in particular preventing them from transitively generalizing equivalence relations. Even for compositional tasks that kernel models can in principle learn, we identify novel failure modes in compositional generalization that arise from biases in the training data and affect important compositional building blocks such as symbolic addition and context dependence (memorization leak and shortcut bias). Finally, we empirically validate our theory, showing that it captures the behavior of deep neural networks (convolutional networks, residual networks, and Vision Transformers) trained on a set of compositional tasks with similarly structured data. Ultimately, this work provides a theoretical perspective on how statistical structure in the training data can affect compositional generalization, with implications for how to identify and remedy failure modes in deep learning models.

Authors (2)

Samuel Lippl (4 papers)
Kim Stachenfeld (3 papers)

Citations (2)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/BioPapers/status/1910009516848034125

https://twitter.com/BioPapers/status/1795396581908263242

When does compositional structure yield compositional generalization? A kernel theory (2405.16391v2)

Summary

Related Papers

Tweets