Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Compositional Generalization from First Principles (2307.05596v1)

Published 10 Jul 2023 in cs.LG and stat.ML

Abstract: Leveraging the compositional nature of our world to expedite learning and facilitate generalization is a haLLMark of human perception. In machine learning, on the other hand, achieving compositional generalization has proven to be an elusive goal, even for models with explicit compositional priors. To get a better handle on compositional generalization, we here approach it from the bottom up: Inspired by identifiable representation learning, we investigate compositionality as a property of the data-generating process rather than the data itself. This reformulation enables us to derive mild conditions on only the support of the training distribution and the model architecture, which are sufficient for compositional generalization. We further demonstrate how our theoretical framework applies to real-world scenarios and validate our findings empirically. Our results set the stage for a principled theoretical study of compositional generalization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1):3–71, March 1988. ISSN 0010-0277. doi: 10.1016/0010-0277(88)90031-5.
  2. Noam Chomsky. Aspects of the Theory of Syntax, volume 11. MIT press, 2014.
  3. Visual Representation Learning Does Not Generalize Strongly Within the Same Domain. arXiv:2107.08221 [cs], February 2022.
  4. The role of Disentanglement in Generalisation. In International Conference on Learning Representations, February 2022a.
  5. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In International Conference on Machine Learning, pages 2873–2882. PMLR, 2018.
  6. Rearranging the familiar: Testing compositional generalization in recurrent networks. arXiv preprint arXiv:1807.07545, 2018.
  7. Measuring compositional generalization: A comprehensive method on realistic data. arXiv preprint arXiv:1912.09713, 2019.
  8. On the fairness of disentangled representations. Advances in neural information processing systems, 32, 2019.
  9. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798–1828, 2013. doi: 10.1109/TPAMI.2013.50.
  10. Are disentangled representations helpful for abstract visual reasoning? In H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  11. Lost in Latent Space: Examining failures of disentangled models at combinatorial generalisation. In Advances in Neural Information Processing Systems, October 2022b.
  12. Embrace the gap: VAEs perform independent mechanism analysis. arXiv preprint arXiv:2206.02416, 2022.
  13. Contrastive learning inverts the data generating process. In International Conference on Machine Learning, pages 12979–12990. PMLR, 2021.
  14. Self-supervised learning with data augmentations provably isolates content from style. Advances in neural information processing systems, 34:16451–16467, 2021.
  15. Compositional generalization in a deep seq2seq model by separating syntax and semantics. arXiv preprint arXiv:1904.09708, 2019.
  16. Learning to recombine and resample data for compositional generalization. arXiv preprint arXiv:2010.03706, 2020.
  17. Exploiting semantics in neural machine translation with graph convolutional networks. arXiv preprint arXiv:1804.08313, 2018.
  18. Object-Centric Learning with Slot Attention. In Advances in Neural Information Processing Systems, volume 33, pages 11525–11538. Curran Associates, Inc., 2020.
  19. Matrix capsules with EM routing. In International Conference on Learning Representations, 2018.
  20. Multi-object representation learning with iterative variational inference. In International Conference on Machine Learning, pages 2424–2433. PMLR, 2019.
  21. MONet: Unsupervised Scene Decomposition and Representation, January 2019.
  22. Conditional object-centric learning from video. arXiv preprint arXiv:2111.12594, 2021.
  23. Training neural networks to encode symbols enables combinatorial generalization. Philosophical Transactions of the Royal Society B: Biological Sciences, 375(1791):20190309, February 2020. doi: 10.1098/rstb.2019.0309.
  24. Learning and generalization of compositional representations of visual scenes, March 2023.
  25. Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  26. Combinatorial optimization and reasoning with graph neural networks.
  27. Bi-linear value networks for multi-goal reinforcement learning. In International Conference on Learning Representations, 2021.
  28. Complex-Valued Autoencoders for Object Discovery, November 2022.
  29. Domain adaptation–can quantity compensate for quality? Annals of Mathematics and Artificial Intelligence, 70:185–202, 2014.
  30. Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, 8(5), 2007.
  31. Towards Out-Of-Distribution Generalization: A Survey. arXiv:2108.13624 [cs], August 2021. doi: 10.48550/arXiv.2108.13624.
  32. A theory of learning from different domains. Machine learning, 79:151–175, 2010.
  33. Learning to Extrapolate: A Transductive Approach. In The Eleventh International Conference on Learning Representations, February 2023.
  34. First Steps Toward Understanding the Extrapolation of Nonlinear Models to Unseen Domains. In NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications, October 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Thaddäus Wiedemer (6 papers)
  2. Prasanna Mayilvahanan (4 papers)
  3. Matthias Bethge (103 papers)
  4. Wieland Brendel (55 papers)
Citations (23)