From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks (2405.15164v1)
Abstract: Compositionality has long been considered a key explanatory property underlying human intelligence: arbitrary concepts can be composed into novel complex combinations, permitting the acquisition of an open ended, potentially infinite expressive capacity from finite learning experiences. Influential arguments have held that neural networks fail to explain this aspect of behavior, leading many to dismiss them as viable models of human cognition. Over the last decade, however, modern deep neural networks (DNNs), which share the same fundamental design principles as their predecessors, have come to dominate artificial intelligence, exhibiting the most advanced cognitive behaviors ever demonstrated in machines. In particular, LLMs, DNNs trained to predict the next word on a large corpus of text, have proven capable of sophisticated behaviors such as writing syntactically complex sentences without grammatical errors, producing cogent chains of reasoning, and even writing original computer programs -- all behaviors thought to require compositional processing. In this chapter, we survey recent empirical work from machine learning for a broad audience in philosophy, cognitive science, and neuroscience, situating recent breakthroughs within the broader context of philosophical arguments about compositionality. In particular, our review emphasizes two approaches to endowing neural networks with compositional generalization capabilities: (1) architectural inductive biases, and (2) metalearning, or learning to learn. We also present findings suggesting that LLM pretraining can be understood as a kind of metalearning, and can thereby equip DNNs with compositional generalization abilities in a similar way. We conclude by discussing the implications that these findings may have for the study of compositionality in human cognition and by suggesting avenues for future research.
- Alammar, J. (2018). The Illustrated Transformer.
- Abstractors: Transformer Modules for Symbolic Message Passing and Relational Reasoning.
- Andreas, J. (2019). Measuring Compositionality in Representation Learning.
- Andreas, J. (2020). Good-Enough Compositional Data Augmentation.
- Learning to Compose Neural Networks for Question Answering. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1545–1554.
- Neural Module Networks.
- Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI.
- Human category learning 2.0. Annals of the New York Academy of Sciences, 1224(1):147–161.
- Austin, J. L. (1975). How to Do Things with Words: Second Edition. Harvard University Press, Cambridge, Mass, 2nd edition edition.
- Frontal cortex and the discovery of abstract action rules. Neuron, 66(2):315–326.
- Neural Machine Translation by Jointly Learning to Align and Translate. arXiv:1409.0473 [cs, stat].
- CLOSURE: Assessing Systematic Generalization of CLEVR Models.
- Systematic Generalization: What Is Required and Can It Be Learned? arXiv:1811.12889 [cs].
- Categorical semantics of compositional reinforcement learning.
- Learning in High Dimension Always Amounts to Extrapolation.
- Baroni, M. (2020). Linguistic generalization and compositionality in modern artificial neural networks. Philosophical Transactions of the Royal Society B: Biological Sciences, 375(1791):20190307.
- Baroni, M. (2022). On the proper role of linguistically-oriented deep net analysis in linguistic theorizing.
- Rethinking Innateness: A Connectionist Perspective on Development. The MIT Press.
- Relational inductive biases, deep learning, and graph networks.
- What Is a Cognitive Map? Organizing Knowledge for Flexible Behavior. Neuron, 100(2):490–509.
- On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, pages 610–623, New York, NY, USA. Association for Computing Machinery.
- Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data. In Jurafsky, D., Chai, J., Schluter, N., and Tetreault, J., editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5185–5198, Online. Association for Computational Linguistics.
- On the search for new learning rules for ANNs. Neural Processing Letters, 2(4):26–30.
- Learning a synaptic learning rule. IJCNN-91-Seattle International Joint Conference on Neural Networks, ii:969.
- Bergelson, E. (2020). The Comprehension Boost in Early Word Learning: Older Infants Are Better Learners. Child development perspectives, 14(3):142–149.
- Improving Image Generation with Better Captions.
- Meta-Learned Models of Cognition. Behavioral and Brain Sciences, pages 1–38.
- Constructional apraxia in patients with discrete missile wounds of the brain. Cortex: A Journal Devoted to the Study of the Nervous System and Behavior, 12(3):212–220.
- Language Models are Few-Shot Learners.
- What company do words keep? Revisiting the distributional semantics of J.R. Firth & Zellig Harris. In Carpuat, M., de Marneffe, M.-C., and Meza Ruiz, I. V., editors, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4403–4417, Seattle, United States. Association for Computational Linguistics.
- Sparks of Artificial General Intelligence: Early experiments with GPT-4.
- Buckner, C. (2018). Empiricism without magic: Transformational abstraction in deep convolutional neural networks. Synthese, 195(12):5339–5372.
- Buckner, C. (2019). Deep learning: A philosophical introduction. Philosophy Compass, 14(10):e12625.
- Buckner, C. (2023a). Black Boxes or Unflattering Mirrors? Comparative Bias in the Science of Machine Behaviour. British Journal for the Philosophy of Science, 74(3):681–712.
- Buckner, C. J. (2023b). From Deep Learning to Rational Machines: What the History of Philosophy Can Teach Us about the Future of Artificial Intelligence. Oxford University Press, New York, NY.
- Burge, T. (2007). Philosophy of Mind: 1950–2000. In The Edinburgh Companion to Twentieth-Century Philosophies, pages 210–233. Edinburgh University Press.
- Broken Neural Scaling Laws.
- Thunderstruck: The ACDC model of flexible sequences and rhythms in recurrent neural circuits. PLOS Computational Biology, 18(2):e1009854.
- A Relational Inductive Bias for Dimensional Abstraction in Neural Networks.
- Relational Constraints On Neural Networks Reproduce Human Biases towards Abstract Geometric Regularity.
- Explanatory models in neuroscience: Part 1 – taking mechanistic abstraction seriously.
- Explanatory models in neuroscience: Part 2 – constraint-based intelligibility.
- Carey, S. (2011). The Origin of Concepts. Oxford Series in Cognitive Development. Oxford University Press, Oxford, New York.
- Carnap, R. (1937). The Logical Syntax of Language. International Library of Psychology, Philosophy and Scientific Method. Harcourt, Brace and Company.
- Carnap, R. (1988). Meaning and Necessity: A Study in Semantics and Modal Logic. University of Chicago Press, Chicago, IL, 2nd edition.
- CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models.
- Align and Augment: Generative Data Augmentation for Compositional Generalization. In Conference of the European Chapter of the Association for Computational Linguistics.
- Chafee, M. V. (2013). A scalar neural code for categories in parietal cortex: Representing cognitive variables as "more" or "less". Neuron, 77(1):7–9.
- Representing Spatial Relationships in Posterior Parietal Cortex: Single Neurons Code Object-Referenced Position. Cerebral Cortex, 17(12):2914–2932.
- Compositional Generalization for Multi-label Text Classification: A Data-Augmentation Approach.
- Systematicity Emerges in Transformers when Abstract Grammatical Roles Guide Attention. In Ippolito, D., Li, L. H., Pacheco, M. L., Chen, D., and Xue, N., editors, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, pages 1–8, Hybrid: Seattle, Washington + Online. Association for Computational Linguistics.
- Chalmers, D. J. (1993). Why Fodor and Pylyshyn Were Wrong: The Simplest Refutation. Philosophical Psychology.
- Data Distributional Properties Drive Emergent In-Context Learning in Transformers.
- The enigma of Bálint’s syndrome: Neural substrates and cognitive deficits. Frontiers in Human Neuroscience, 8.
- Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs.
- Evaluating Large Language Models Trained on Code.
- Chomsky, N. (1957). Syntactic Structures. De Gruyter Mouton.
- Chomsky, N. (1965). Aspects of the Theory of Syntax. The MIT Press.
- Chomsky, N. (1966). Cartesian Linguistics. Harper & Row.
- Chomsky, N. (1968). Language and Mind. Cambridge University Press.
- Chomsky, N. (1975). Questions of Form and Interpretation. In Austerlitz, R., editor, The Scope of American Linguistics, pages 159–196. Peter de Ridder Press, Lisse.
- Chomsky, N. (2020). The UCLA Lectures.
- On the Control of Automatic Processes: A Parallel Distributed Processing Model of the Stroop Effect. Psychological Review, 97(3):332–361.
- Rapid Transfer of Abstract Rules to Novel Contexts in Human Lateral Prefrontal Cortex. Frontiers in Human Neuroscience, 5.
- Cognitive control over learning: Creating, clustering, and generalizing task-set structure. Psychological Review, 120(1):190–229.
- Within- and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory. Proceedings of the National Academy of Sciences, 115(10):2502–2507.
- Towards Automated Circuit Discovery for Mechanistic Interpretability.
- Testing Relational Understanding in Text-Guided Image Generation.
- Mechanisms of Rule Acquisition and Rule Following in Inductive Reasoning. Journal of Neuroscience, 31(21):7763–7774.
- Cresswell, M. (1973). Logics and Languages. Methuen [Distributed in the U.S.A. by Harper & Row, London,.
- Revisiting the Performance-Explainability Trade-Off in Explainable Artificial Intelligence (XAI).
- The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers.
- Cybenkot, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2:303–314.
- The paradox of the compositionality of natural language: A neural machine translation case study.
- Language models show human-like content effects on reasoning tasks.
- Davidson, D. (1965). Theories of Meaning and Learnable Languages. In Bar-Hillel, Y., editor, Proceedings of the 1964 International Congress for Logic, Methodology, and Philosophy of Science, pages 383–394. North-Holland.
- Symbols and mental programs: A hypothesis about human singularity. Trends in Cognitive Sciences, 26(9):751–766.
- Scaling Vision Transformers to 22 Billion Parameters.
- Curriculum learning for human compositional generalization. Proceedings of the National Academy of Sciences, 119(41):e2205582119.
- Dell, G. S. (1986). A spreading-activation theory of retrieval in sentence production. Psychological Review, 93(3):283–321.
- CNNs found to jump around more skillfully than RNNs: Compositional generalization in seq2seq convolutional networks. arXiv:1905.08527 [cs].
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
- Compositionality in Computational Linguistics. Annual Review of Linguistics, 9(1):463–481.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
- Compositional Semantic Parsing with Large Language Models.
- RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning.
- Connecting Context-specific Adaptation in Humans to Meta-learning.
- Dummett, M. (1993). Frege: Philosophy of Language, Second Edition. Harvard University Press, Cambridge, Mass, 2nd edition edition.
- Dummett, M. (1996). What Is a Theory of Meaning? (II). In The Seas of Language, page 0. Oxford University Press.
- Complexity and compositionality in fluid intelligence. Proceedings of the National Academy of Sciences.
- Faith and Fate: Limits of Transformers on Compositionality.
- A mathematical framework for transformer circuits. Transformer Circuits Thread.
- DreamCoder: Bootstrapping inductive program synthesis with wake-sleep library learning. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2021, pages 835–850, New York, NY, USA. Association for Computing Machinery.
- Evans, G. (1985). Semantic Theory and Tacit Knowledge. In Collected Papers, pages 322–342. Oxford University Press.
- Debiasing by instruction: The case of belief bias. European Journal of Cognitive Psychology, 6(3):263–285.
- Human-level play in the game of Diplomacy by combining language models with strategic reasoning. Science, 0(0):eade9097.
- The development of reasoning by exclusion in infancy. Cognitive Psychology, 135:101473.
- The developmental trajectories of executive function from adolescence to old age. Scientific Reports, 11(1):1382.
- Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks.
- Firestone, C. (2020). Performance vs. competence in human–machine comparisons. Proceedings of the National Academy of Sciences, 117(43):26562–26571.
- Firth, J. (1957). A Synopsis of Linguistic Theory, 1930-55. In Selected Papers of J. R. Firth 1952-1959. Basil Blackwell, Oxford.
- Comparing continual task learning in minds and machines. Proceedings of the National Academy of Sciences, 115(44):E10313–E10322.
- Orthogonal representations for robust context-dependent task performance in brains and neural networks. Neuron, pages S0896–6273(22)00005–8.
- Fodor, J. (1987). Psychosemantics: The Problem of Meaning in the Philosophy of MInd. The MIT Press.
- The red herring and the pet fish: Why concepts still can’t be prototypes. Cognition, 58(2):253–270.
- Why Compositionality Won’t Go Away: Reflections on Horwich’s ‘Deflationary’ Theory. Ratio, 14(4):350–368.
- Fodor, J. A. (1968a). The Appeal to Tacit Knowledge in Psychological Explanation. The Journal of Philosophy, 65(20):627–640.
- Fodor, J. A. (1968b). Psychological Explanation: An Introduction to the Philosophy of Psychology. Random House, Ny.
- Fodor, J. A. (1975). The Language of Thought. Harvard University Press.
- Fodor, J. A. (2000). The Mind Doesn’t Work That Way: The Scope and Limits of Computational Psychology. The MIT Press.
- Fodor, J. A. (2003). Hume Variations. Oxford University Press UK, Oxford, GB.
- Why Meaning (Probably) Isn’t Conceptual Role. Philosophical Issues, 3:15–35.
- Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1):3–71.
- Frank, M. C. (2023). Bridging the data gap between children and large language models. Trends in Cognitive Sciences, 27(11):990–992.
- Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: Computational analysis. Cerebral Cortex (New York, N.Y.: 1991), 22(3):509–526.
- Concepts and Compositionality: In Search of the Brain’s Language of Thought. Annual Review of Psychology, 71(Volume 71, 2020):273–303.
- No Coincidence, George: Capacity-Limits as the Curse of Compositionality.
- Frege, G. (1956). The Thought: A Logical Inquiry. Mind, 65(259):289–311.
- Frege, G. (1980). Letter to Jourdain. In Gabriel, G., Hermes, H., Kambartel, F., Thiel, C., and Veraart, A., editors, Philosophical and Mathematical Correspondence. Basil Blackwell.
- Parietal contributions to visual feature binding: Evidence from a patient with bilateral lesions. Science (New York, N.Y.), 269(5225):853–855.
- Garson, J. W. (1994). Cognition without classical architecture. Synthese, 100(2):291–305.
- Knowledge-Infused Learning: A Sweet Spot in Neuro-Symbolic AI. IEEE Internet Computing, 26(4):5–11.
- Goel, V. (2007). Anatomy of deductive reasoning. Trends in Cognitive Sciences, 11(10):435–441.
- Deep Learning. Adaptive Computation and Machine Learning. The MIT Press, Cambridge, Massachusetts.
- Concepts in a Probabilistic Language of Thought. In Margolis, E. and Laurence, S., editors, The Conceptual Mind, pages 623–654. The MIT Press.
- Grice, P. (1991). Studies in the Way of Words. Harvard University Press.
- Doing more with less: Meta-reasoning and meta-learning in humans and machines. Current Opinion in Behavioral Sciences, 29:24–30.
- Program Synthesis. Number 4.2017, 1-2 in Foundations and Trends in Programming Languages. Now Publishers, Hanover, MA Delft.
- Hacker, P. M. S. (1975). Insight and Illusion: Wittgenstein on Philosophy and the Metaphysics of Experience. Oxford University Press, Oxford.
- Hadley, R. F. (1997). Cognition, Systematicity, and Nomic Necessity. Mind and Language, 12(2):137–53.
- Lateral prefrontal cortex subregions make dissociable contributions during fluid reasoning. Cerebral Cortex (New York, N.Y.: 1991), 21(1):1–10.
- Harris, D. W. (2017). The History and Prehistory of Natural-Language Semantics. In Lapointe, S. and Pincock, C., editors, Innovations in the History of Analytical Philosophy, pages 149–194. Palgrave Macmillan UK, London.
- Meta-reinforcement learning via orbitofrontal cortex. Nature Neuroscience, 26(12):2182–2191.
- Heck, R. K. (2013). What is Compositionality? Unpublished Manuscript.
- A large-scale comparison of human-written versus ChatGPT-generated essays. Scientific Reports, 13(1):18617.
- A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of Physiology, 117(4):500–544.
- Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8):2554–2558.
- Structured Representations in Connectionist Systems? In Davis, S., editor, Connectionism: Theorye and Practice. Oxford University Press.
- Hornik, K. (1991). Approximation Capabilities of Muitilayer Feedforward Networks. Neural Networks, 4:251–257.
- Horwich, P. (2013). Wittgenstein’s Metaphilosophy. Oxford University Press, Oxford, New York.
- Meta-Learning in Neural Networks: A Survey.
- SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality.
- Towards Reasoning in Large Language Models: A Survey.
- Risks from Learned Optimization in Advanced Machine Learning Systems.
- Compositionality decomposed: How do neural networks generalise? arXiv:1908.08351 [cs, stat].
- A taxonomy and review of generalization research in NLP. Nature Machine Intelligence, 5(10):1161–1174.
- Janssen, T. (2012). Compositionality: Its Historic Context. In Werning, M., Hinzen, W., and Machery, E., editors, The Oxford Handbook of Compositionality. Oxford University Press, Oxford.
- A Neural Mechanism for Sensing and Reproducing a Time Interval. Current Biology, 25(20):2599–2609.
- Inducing Transformer’s Compositional Generalization Ability via Auxiliary Sequence Prediction Tasks.
- Enriching Transformers with Structured Tensor-Product Representations for Abstractive Summarization.
- Mutual Exclusivity Training and Primitive Augmentation to Induce Compositionality.
- Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings.
- CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning.
- Compositional Reinforcement Learning from Logical Specifications. In Advances in Neural Information Processing Systems, volume 34, pages 10026–10039. Curran Associates, Inc.
- Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873):583–589.
- Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics. In Proceedings of the 34th International Conference on Machine Learning, pages 1809–1818. PMLR.
- Scaling Laws for Neural Language Models.
- Katz, J. J. (1971). The Underlying Reality of Language and Its Philosophical Import. Harper & Row, New York,.
- The Structure of a Semantic Theory. Language, 39(2):170–210.
- Measuring compositional generalization: A comprehensive method on realistic data.
- Kim, J. (1981). Psychophysical Supervenience. Philosophical Studies.
- COGS: A Compositional Generalization Challenge Based on Semantic Interpretation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9087–9105, Online. Association for Computational Linguistics.
- Recurrent Neural Networks in Linguistic Theory: Revisiting Pinker and Prince (1988) and the Past Tense Debate. Transactions of the Association for Computational Linguistics, 6:651–665.
- On belief bias in syllogistic reasoning. Psychological Review, 107(4):852–884.
- Large Language Models are Zero-Shot Reasoners.
- Indirection and symbol-like processing in the prefrontal cortex and basal ganglia. Proceedings of the National Academy of Sciences U.S.A., 110(41):16390–16395.
- ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc.
- Lake, B. M. (2019). Compositional generalization through meta sequence-to-sequence learning.
- Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks. In Dy, J. G. and Krause, A., editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 2879–2888. PMLR.
- Human-like systematic generalization through a meta-learning neural network. Nature, pages 1–7.
- Human few-shot learning of compositional instructions. In Goel, A. K., Seifert, C. M., and Freksa, C., editors, Proceedings of the 41th Annual Meeting of the Cognitive Science Society, CogSci 2019: Creativity + Cognition + Computation, Montreal, Canada, July 24-27, 2019, pages 611–617. cognitivesciencesociety.org.
- Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338.
- Building machines that learn and think like people. The Behavioral and Brain Sciences, 40:e253.
- Large Language Models and the Argument From the Poverty of the Stimulus.
- Measuring Faithfulness in Chain-of-Thought Reasoning.
- The Poverty of the Stimulus Argument. The British Journal for the Philosophy of Science, 52(2):217–276.
- Deep learning. Nature, 521(7553):436–444.
- Break It Down: Evidence for Structural Compositionality in Neural Networks.
- Does CLIP Bind Concepts? Probing Compositionality in Large Image Models.
- A Survey on Transformers in Reinforcement Learning.
- Compositional Generalization for Primitive Substitutions. arXiv:1910.02612 [cs].
- Lightfoot, D. (1989). The child’s trigger experience: Degree-0 learnability. Behavioral and Brain Sciences, 12(2):321–375.
- Lindsay, G. W. (2020). Attention in Psychology, Neuroscience, and Machine Learning. Frontiers in Computational Neuroscience, 14.
- Syntactic Structure from Deep Learning. Annual Review of Linguistics, 7(1):195–212.
- Lipton, Z. C. (2017). The Mythos of Model Interpretability. arXiv:1606.03490 [cs, stat].
- Hierarchical clustering optimizes the tradeoff between compositionality and expressivity of task structures for flexible reinforcement learning. Artificial Intelligence, 312:103770.
- SUSTAIN: A network model of category learning. Psychological Review, 111(2):309–332.
- An Examination of the Compositionality of Large Generative Vision-Language Models.
- CREPE: Can Vision-Language Foundation Models Reason Compositionally? In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10910–10921, Vancouver, BC, Canada. IEEE.
- Connectionism: Debates on Psychological Explanation, Volume 2. Wiley-Blackwell, Oxford, 1st edition edition.
- Emergent linguistic structure in artificial neural networks trained by self-supervision. Proceedings of the National Academy of Sciences, page 201907367.
- Evolution of declarative memory. Hippocampus, 16(9):795–808.
- The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision.
- Marcus, G. (2018). Deep learning: A critical appraisal.
- Marcus, G. (2020). The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence.
- Rebooting AI: Building Artificial Intelligence We Can Trust. Vintage, New York.
- Marcus, G. F. (1998). Rethinking Eliminative Connectionism. Cognitive Psychology, 37(3):243–282.
- Matthews, R. J. (1994). Three-Concept Monte: Explanation, Implementation and Systematicity. Synthese, 101(3):347–363.
- Why There Are Complementary Learning Systems in the Hippocampus and Neocortex: Insights from the Successes and Failures of Connectionist Models of Learning and Memory. Psychological Review, 102(3):419–457.
- Rules or connections in past-tense inflections: What does the evidence rule out? Trends in Cognitive Sciences, 6(11):465–472.
- The Appeal of Parallel Distributed Processing. In McClelland, J. L., Rumelhart, D. E., and Group, P. R., editors, Parallel Distributed Processing. Volume 2: Psychological and Biological Models, pages 216–271. MIT Press, Cambridge, MA.
- An interactive activation model of context effects in letter perception: I. An account of basic findings. Psychological Review, 88(5):375–407.
- Parallel Distributed Processing: Explorations in the Microstructure of Cognition, volume 2: Psychological and Biological Models. MIT Press.
- Universal linguistic inductive biases via meta-learning.
- RNNs Implicitly Implement Tensor Product Representations.
- A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4):115–133.
- How Can Deep Neural Networks Inform Theory in Psychological Science?
- Multiple Realizability and the Rise of Deep Learning. Proceedings of the 46th Annual Conference of the Cognitive Science Society (forthcoming).
- Properties of Lots: The Footprints or the Bear Itself? Behavioral and Brain Sciences, 46:e284.
- McLaughlin, B. P. (1993). The connectionism/classicism battle to win souls. Philosophical Studies, 71(2):163–190.
- Scaling deep learning for materials discovery. Nature, 624(7990):80–85.
- Circuit Component Reuse Across Tasks in Transformer Language Models.
- An integrative theory of prefrontal cortex function. Annual Review of Neuroscience, 24:167–202.
- Miller, G. A. (2003). The cognitive revolution: A historical perspective. Trends in Cognitive Sciences, 7(3):141–144.
- Millière, R. (2024). Philosophy of Cognitive Science in the Age of Deep Learning.
- A Philosophical Introduction to Language Models – Part I: Continuity With Classic Debates.
- A Philosophical Introduction to Language Models - Part II: The Way Forward.
- Contribution of striate inputs to the visuospatial functions of parieto-preoccipital cortex in monkeys. Behavioural Brain Research, 6(1):57–77.
- Compositional Chain-of-Thought Prompting for Large Multimodal Models.
- Montague, R. (1970). Universal grammar. Theoria, 36(3):373–398.
- Montague, R. (1974). Formal Philosophy; Selected Papers of Richard Montague. Yale University Press, New Haven.
- Developing Cognitive Control: Three Key Transitions. Current directions in psychological science, 21(2):71–77.
- Show Your Work: Scratchpads for Intermediate Computation with Language Models.
- Learning Compositional Rules via Neural Program Synthesis. In Advances in Neural Information Processing Systems, volume 33, pages 10832–10842. Curran Associates, Inc.
- Learning Compositional Rules via Neural Program Synthesis. arXiv:2003.05562 [cs].
- In-context Learning and Induction Heads.
- Large language models in medicine: The potentials and pitfalls.
- GPT-4 Technical Report.
- Making working memory work: A computational model of learning in the prefrontal cortex and basal ganglia. Neural Computation, 18(2):283–328.
- Computational Cognitive Neuroscience. Wiki Book, 1st Edition, URL: http://ccnbook.colorado.edu.
- How Sequential Interactive Processing Within Frontostriatal Loops Supports a Continuum of Habitual to Controlled Processing. Frontiers in Psychology, 11.
- How Limited Systematicity Emerges: A Computational Cognitive Neuroscience Approach. In Calvo, I. P. and Symons, J., editors, The Architecture of Cognition: Rethinking Fodor and Pylyshyn1s Systematicity Challenge, pages 191–225. MIT Press, Cambridge, MA.
- The Structure of Systematicity in the Brain. arXiv:2108.03387 [q-bio].
- Prompting Large Vision-Language Models for Compositional Reasoning.
- Training language models to follow instructions with human feedback.
- Compositionality I: Definitions and Variants. Philosophy Compass, 5(3):250–264.
- Question-Answering with Grammatically-Interpretable Representations. arXiv:1705.08432 [cs].
- Map Making: Constructing, Combining, and Inferring on Abstract Cognitive Maps. Neuron, 107(6):1226–1238.e8.
- Partee, B. (2004). Compositionality. In Compositionality in Formal Semantics. Blackwell Publishing.
- Pavlick, E. (2023). Symbols and grounding in large language models. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 381(2251):20220041.
- Pelletier, F. J. (2001). Did Frege Believe Frege’s Principle? Journal of Logic, Language, and Information, 10(1):87–114.
- The Language of Thought in Late Medieval Philosophy: Essays in Honor of Claude Panaccio, volume 5 of Historical-Analytical Studies on Nature, Mind and Action. Springer International Publishing, Cham.
- Deep contextualized word representations.
- Piantadosi, S. (2023). Modern language models refute Chomsky’s approach to language.
- Compositional Reasoning in Early Childhood. PLOS ONE, 11(9):e0147734.
- Piantadosi, S. T. (2021). The Computational Origin of Representation. Minds and Machines, 31(1):1–58.
- Limits on composition of conceptual operations in 9-month-olds. Infancy : the official journal of the International Society on Infant Studies, 23(3):310–324.
- The logical primitives of thought: Empirical foundations for compositional cognitive models. Psychological Review, 123(4):392–424.
- Pinker, S. (1994). The Language Instinct: How the Mind Creates Language. William Morrow & Co, New York, first edition edition.
- Natural language and natural selection. Behavioral and Brain Sciences, 13(4):707–784.
- On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28(1):73–193.
- The past and future of the past tense. Trends in Cognitive Sciences, 6(11):456–463.
- Measuring and Narrowing the Compositionality Gap in Language Models.
- Optimality Theory: Constraint Interaction in Generative Grammar. In McCarthy, J. J., editor, Optimality Theory in Phonology, pages 1–71. Wiley.
- Putnam, H. (1995). Do True Assertions Correspond to Reality? In Mind, Language and Reality: Philosophical Papers, Volume 2, pages 70–84. Cambridge University Press.
- Pylyshyn, Z. (1989). The role of location indexes in spatial perception: A sketch of the FINST spatial-index model. Cognition, 32(1):65–97.
- The best game in town: The reemergence of the language-of-thought hypothesis across the cognitive sciences. Behavioral and Brain Sciences, 46:e261.
- Quine, W. V. (1970). Methodological Reflections on Current Linguistic Theory. Synthese, 21(3/4):386–398.
- Neural Index of Reinforcement Learning Predicts Improved Stimulus–Response Retention under High Working Memory Load. Journal of Neuroscience, 43(17):3131–3143.
- Language Models are Unsupervised Multitask Learners.
- Philosophy and Connectionist Theory. Routledge.
- Connectionism, Eliminativism and The Future of Folk Psychology. Philosophical Perspectives, 4:499–533.
- Two cortical systems for memory-guided behaviour. Nature Reviews. Neuroscience, 13(10):713–726.
- Coding of visual objects in the ventral stream. Current Opinion in Neurobiology, 16(4):408–414.
- Mathematical discoveries from program search with large language models. Nature, 625(7995):468–475.
- The Linguistic Turn: Essays in Philosophical Method. University of Chicago Press, Chicago, IL.
- Prefrontal cortex and flexible cognitive control: Rules without symbols. Proceedings of the National Academy of Sciences, 102(20):7338–7343.
- Key Concepts in AI Safety: Interpretability in Machine Learning.
- A Benchmark for Systematic Generalization in Grounded Language Understanding. In Advances in Neural Information Processing Systems, volume 33, pages 19861–19872. Curran Associates, Inc.
- Learning representations by back-propagating errors. Nature, 323(6088):533–536.
- On learning the past tenses of english verbs. In McClelland, J. L., Rumelhart, D. E., and Group, P. R., editors, Parallel Distributed Processing. Volume 2: Psychological and Biological Models, pages 216–271. MIT Press, Cambridge, MA.
- Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 2: Psychological and Biological Models. MIT Press, Cambridge, MA, USA.
- Compositional Processing Emerges in Neural Networks Solving Math Problems. In Proceedings for the 43rd Annual Meeting of the Cognitive Science Society.
- Systematicity in a Recurrent Neural Network by Factorizing Syntax and Semantics. In Proceedings for the 42nd Annual Meeting of the Cognitive Science Society, page 7.
- Is human compositionality meta-learned? Behavioral and Brain Sciences (forthcoming).
- Deep learning needs a prefrontal cortex. In Bridging AI and Cognitive Science (BAICS) Workshop, ICLR 2020, page 11.
- Human Curriculum Effects Emerge with In-Context Learning in Neural Networks.
- Complementary Structure-Learning Neural Networks for Relational Reasoning. In Proceedings for the 43rd Annual Meeting of the Cognitive Science Society.
- Ryle, G. (1949). Discussion: Meaning and Necessity. Philosophy, 24(88):69–76.
- Ryle, G. (1957). The Theory of Meaning. In Muirhead, J. H., editor, British Philosophy in the Mid-Century, pages 239–64. George Allen and Unwin.
- Ryle, G. (1964). Ordinary Language. In Chappell, V., editor, Ordinary Language: Essays in Philosophical Method. Prentice Hall.
- Modelling cognitive flexibility with deep neural networks. Current Opinion in Behavioral Sciences, 57:101361.
- Meta-Learning with Memory-Augmented Neural Networks. In International Conference on Machine Learning.
- Analysing mathematical reasoning abilities of neural models. In 7th Intern. Conf. on Lear. Repr., new orleans, LA, USA. OpenReview.net.
- Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving. arXiv:1910.06611 [cs, stat].
- Schmidhuber, J. (1987). Evolutionary principles in self-referential learning, or on learning how to learn: The meta-meta-. hook. Master’s thesis, Technische Universitat Munchen, Germany.
- Neural Machine Translation of Rare Words with Subword Units.
- Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning.
- Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489.
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419):1140–1144.
- Large language models encode clinical knowledge. Nature, 620(7972):172–180.
- Smolensky, P. (1986). Neural and Conceptual Interpretation of PDP Models. In McClelland, J. L., Rumelhart, D. E., and Group, P. R., editors, Parallel Distributed Processing. Volume 2: Psychological and Biological Models, pages 216–271. MIT Press, Cambridge, MA.
- Smolensky, P. (1988). On the proper treatment of connectionism. Behavioral and Brain Sciences, 11(1):1–23.
- Smolensky, P. (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence, 46(1-2):159–216.
- Smolensky, P. (1991). Connectionism, Constituency and the Language of Thought. In Loewer, B. M., editor, Meaning in Mind: Fodor and His Critics. Blackwell.
- The Harmonic Mind, Volume 1: From Neural Computation to Optimality-Theoretic Grammar Volume I: Cognitive Architecture. MIT Press, reprint edition edition.
- Neurocompositional computing: From the Central Paradox of Cognition to a new generation of AI systems. AI Magazine, 43(3):308–322.
- Neurocompositional computing in human and machine intelligence: A tutorial.
- Differentiable Tree Operations Promote Compositional Generalization. In Proceedings of the 40th International Conference on Machine Learning, pages 32499–32520. PMLR.
- Discovering the Compositional Structure of Vector Representations with Role Learning Networks. In Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 238–254.
- Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages.
- Spelke, A. E. S. (2022). What Babies Know: Core Knowledge and Composition Volume 1. Oxford Series in Cognitive Development. Oxford University Press, Oxford, New York.
- Core knowledge. Developmental Science, 10(1):89–96.
- SAYCam: A Large, Longitudinal Audiovisual Dataset Recorded From the Infant’s Perspective. Open Mind: Discoveries in Cognitive Science, 5:20–29.
- Structure learning and the posterior parietal cortex. Progress in Neurobiology, 184:101717.
- Sutton, R. (2019). The Bitter Lesson.
- Szabó, Z. G. (2022). Compositionality. In Zalta, E. N. and Nodelman, U., editors, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, fall 2022 edition.
- Thompson-Schill, S. L. (2005). Dissecting the Language Organ: A New Look at the Role of Broca’s Area in Language Processing. In Twenty-First Century Psycholinguistics. Routledge.
- Function Vectors in Large Language Models.
- Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting. Advances in Neural Information Processing Systems, 36:74952–74965.
- Two Cortical Visual Systems. In Ingle, D. J., Goodale, M. A., and Mansfield, R. J. W., editors, The Analysis of Visual Behavior, pages 549–586. MIT Press, Cambridge, MA.
- van Gelder, T. (1990). Compositionality: A Connectionist Variation on a Classical Theme. Cognitive Science, 14(3):355–384.
- Attention is All you Need. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R., editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 5998–6008.
- Semantic Processing in the Anterior Temporal Lobes: A Meta-analysis of the Functional Neuroimaging Literature. Journal of Cognitive Neuroscience, 22(6):1083–1094.
- Uncovering mesa-optimization algorithms in Transformers.
- Grounded language acquisition through the eyes and ears of a single child. Science, 383(6682):504–511.
- Single neurons in prefrontal cortex encode abstract rules. Nature, 411(6840):953–956.
- Wang, J. X. (2020). Meta-learning in natural and artificial intelligence.
- Prefrontal cortex as a meta-reinforcement learning system. Nature Neuroscience, 21(6):860–868.
- Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 small.
- What Artificial Neural Networks Can Tell Us About Human Language Acquisition.
- Call for Papers – The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus.
- Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora. In Warstadt, A., Mueller, A., Choshen, L., Wilcox, E., Zhuang, C., Ciro, J., Mosquera, R., Paranjabe, B., Williams, A., Linzen, T., and Cotterell, R., editors, Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning, pages 1–34, Singapore. Association for Computational Linguistics.
- Wason, P. C. (1968). Reasoning about a Rule. Quarterly Journal of Experimental Psychology, 20(3):273–281.
- Emergent analogical reasoning in large language models. Nature Human Behaviour, 7(9):1526–1541.
- The Relational Bottleneck as an Inductive Bias for Efficient Abstraction.
- Emergent Symbols through Binding in External Memory. arXiv:2012.14601 [cs].
- Do Prompt-Based Models Really Understand the Meaning of Their Prompts? In Carpuat, M., de Marneffe, M.-C., and Meza Ruiz, I. V., editors, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2300–2344, Seattle, United States. Association for Computational Linguistics.
- Finetuned Language Models Are Zero-Shot Learners.
- Emergent Abilities of Large Language Models.
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.
- The Tolman-Eichenbaum Machine: Unifying Space and Relational Memory through Generalization in the Hippocampal Formation. Cell, 183(5):1249–1263.e23.
- Using Computational Models to Test Syntactic Learnability. Linguistic Inquiry, pages 1–44.
- Wittgenstein, L. (1973). Philosophical Investigations. Pearson, Englewood Cliffs, N.J, 3rd edition edition.
- Wright, C. (1981). Rule-Following, Objectivity and the Theory of Meaning. In Holtzman, S. H. and Leich, C. M., editors, Wittgenstein: To Follow A Rule. Routledge.
- Interpretability at Scale: Identifying Causal Mechanisms in Alpaca.
- An Explanation of In-context Learning as Implicit Bayesian Inference.
- Multimodal Learning with Transformers: A Survey.
- Neural Program Synthesis By Self-Learning.
- One model for the learning of language. Proceedings of the National Academy of Sciences, 119(5):e2021865119.
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models.
- A survey on neural-symbolic learning systems. Neural Networks, 166:105–126.
- Iterated Learning Improves Compositionality in Large Vision-Language Models.
- Hierarchical Neural Program Synthesis.
- Least-to-Most Prompting Enables Complex Reasoning in Large Language Models.
- What Algorithms can Transformers Learn? A Study in Length Generalization.
- The temporal stability of visuomotor adaptation generalization. Journal of Neurophysiology, 118(4):2435–2447.
- Compositional diversity in visual concept learning. Cognition, 244:105711.
- Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees.
- The Geometry of Map-Like Representations under Dynamic Cognitive Control. In Proceedings for the 44th Annual Meeting of the Cognitive Science Society.
- Jacob Russin (7 papers)
- Sam Whitman McGrath (2 papers)
- Danielle J. Williams (1 paper)
- Lotem Elber-Dorozko (1 paper)