Limits for Learning with Language Models (2306.12213v1)
Abstract: With the advent of LLMs, the trend in NLP has been to train LLMs on vast amounts of data to solve diverse language understanding and generation tasks. The list of LLM successes is long and varied. Nevertheless, several papers provide empirical evidence that LLMs fail to capture important aspects of linguistic meaning. Focusing on universal quantification, we provide a theoretical foundation for these empirical findings by proving that LLMs cannot learn certain fundamental semantic properties including semantic entailment and consistency as they are defined in formal semantics. More generally, we show that LLMs are unable to learn concepts beyond the first level of the Borel Hierarchy, which imposes severe limits on the ability of LMs, both large and small, to capture many aspects of linguistic meaning. This means that LLMs will continue to operate without formal guarantees on tasks that require entailments and deep linguistic understanding.
- Neural network learning: Theoretical foundations, volume 9. cambridge university press Cambridge.
- Asher, N. (1993). Reference to Abstract Objects in Discourse. Kluwer Academic Publishers.
- Strategic conversation under imperfect information: epistemic Message Exchange games. Logic, Language and Information, 27.4:343–385.
- Message exchange games in strategic conversations. Journal of Philosophical Logic, 46.4:355–404.
- Sdrt and continuation semantics. In Onada, T., Bekki, D., and McCready, E., editors, New Frontiers in Artificial Intelligence: JSAI-isAI 2010 Workshops, LENLS, JURISIN, AMBN, ISS, Tokyo, Japan, November 18-19, 2010, Revised Selected Papers, pages 3–15. Springer Berlin Heidelberg, Berlin, Heidelberg.
- Generalized quantifiers in natural language. Linguistics and Philosophy, 4(1):159–219.
- Climbing towards NLU: On meaning, form, and understanding in the age of data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5185–5198, Online. Association for Computational Linguistics.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Model theory. north Holland.
- Analyzing semantic faithfulness of language models via input intervention on conversational question answering. Computing Research Repository, arXiv:2212.10696.
- The difficulty of computing stable and accurate neural networks: On the barriers of deep learning and smale’s 18th problem. Proceedings of the National Academy of Sciences, 119(12):e2107151119.
- De Groote, P. (2006). Towards a montagovian account of dynamics. In Semantics and linguistic theory, volume 16, pages 1–16.
- Bert: Pre-training of deep bidirectional transformers for language understanding.
- Introduction to Montague semantics. Dordrecht. Synthese Library vol. 11.
- Fernando, T. (2004). A finite-state approach to events in natural language semantics. Journal of Logic and Computation, 14(1):79–92.
- Fernando, T. (2022). Strings from neurons to language. In Proceedings of the 3rd Natural Logic Meets Machine Learning Workshop (NALOMA III), pages 1–10.
- Graf, T. (2019). A subregular bound on the complexity of lexical quantifiers. In Proceedings of the 22nd Amsterdam colloquium.
- Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366.
- An analysis of natural language inference benchmarks through the lens of negation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9106–9118.
- Understanding by understanding not: Modeling negation in language models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1301–1312.
- Negation, coordination, and quantifiers in contextualized language models. In Proceedings of the 29th International Conference on Computational Linguistics, pages 3074–3085.
- From Discourse to Logic: Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Kluwer Academic Publishers.
- Negated and misprimed probes for pretrained language models: Birds can talk, but cannot fly. arXiv preprint arXiv:1911.03343.
- Generalization in deep learning. arXiv preprint arXiv:1710.05468.
- Kechris, A. (1995). Classical descriptive set theory. Springer-Verlag, New York.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526.
- The meaning of “most” for visual question answering models. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 46–55.
- Lamport, L. (1980). Sometime is sometimes not never: On the temporal logic of programs. In Proceedings of the 7th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 174–185. ACM.
- Roberta: A robustly optimized bert pretraining approach.
- Dissociating language and thought in large language models: a cognitive perspective. arXiv preprint arXiv:2301.06627.
- Augmented language models: a survey. arXiv preprint arXiv:2302.07842.
- Montague, R. (1974). Formal Philosophy. Yale University Press, New Haven.
- Efficient induction of logic programs. Inductive logic programming, 38:281–298.
- Stress test evaluation for natural language inference. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2340–2353.
- Exploring generalization in deep learning. Advances in neural information processing systems, 30.
- Plotkin, G. (1972). Automatic methods of inductive inference. PhD thesis, The University of Edinburgh.
- Popper, K. (1963). Conjectures and refutations: The growth of scientific knowledge. routledge.
- Physics informed deep learning (part i): Data-driven solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561.
- CoQA: A Conversational Question Answering Challenge. Transactions of the Association for Computational Linguistics, 7:249–266.
- Reynolds, J. C. (1974). On the relation between direct and continuation semantics. In International Colloquium on Automata, Languages and Programming.
- Learnability, stability and uniform convergence. The Journal of Machine Learning Research, 11:2635–2670.
- Siegelmann, H. T. (2012). Neural networks and analog computation: beyond the Turing limit. Springer Science & Business Media.
- On the computational power of neural nets. In Proceedings of the fifth annual workshop on Computational learning theory, pages 440–449.
- Unnatural language inference. arXiv preprint arXiv:2101.00010.
- Learnability and semantic universals. Semantics and Pragmatics, 12(4).
- Tarski, A. (1944). The semantic conception of truth: and the foundations of semantics. Philosophy and phenomenological research, 4(3):341–376.
- Tarski, A. (1956). The concept of truth in formalized languages. In translated by J.H. Woodger, editor, Logic, Semantics and Metamathematics, pages 152–278. Oxford University Press, New York.
- What do you learn from context? probing for sentence structure in contextualized word representations. In International Conference on Learning Representations.
- On learnability, complexity and stability. In Empirical Inference, pages 59–69. Springer.
- On the practical computational power of finite precision rnns for language recognition. arXiv preprint arXiv:1805.04908.
- No free lunch theorems for search. Technical report, Technical Report SFI-TR-95-02-010, Santa Fe Institute.
- When and why vision-language models behave like bag-of-words models, and what to do about it? arXiv preprint arXiv:2210.01936.
- Nicholas Asher (26 papers)
- Swarnadeep Bhar (3 papers)
- Akshay Chaturvedi (12 papers)
- Julie Hunter (8 papers)
- Soumya Paul (8 papers)