Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve (2309.13638v1)

Published 24 Sep 2023 in cs.CL and cs.AI
Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

Abstract: The widespread adoption of LLMs makes it important to recognize their strengths and limitations. We argue that in order to develop a holistic understanding of these systems we need to consider the problem that they were trained to solve: next-word prediction over Internet text. By recognizing the pressures that this task exerts we can make predictions about the strategies that LLMs will adopt, allowing us to reason about when they will succeed or fail. This approach - which we call the teleological approach - leads us to identify three factors that we hypothesize will influence LLM accuracy: the probability of the task to be performed, the probability of the target output, and the probability of the provided input. We predict that LLMs will achieve higher accuracy when these probabilities are high than when they are low - even in deterministic settings where probability should not matter. To test our predictions, we evaluate two LLMs (GPT-3.5 and GPT-4) on eleven tasks, and we find robust evidence that LLMs are influenced by probability in the ways that we have hypothesized. In many cases, the experiments reveal surprising failure modes. For instance, GPT-4's accuracy at decoding a simple cipher is 51% when the output is a high-probability word sequence but only 13% when it is low-probability. These results show that AI practitioners should be careful about using LLMs in low-probability situations. More broadly, we conclude that we should not evaluate LLMs as if they are humans but should instead treat them as a distinct type of system - one that has been shaped by its own particular set of pressures.

Understanding LLMs

Training Objectives and LLM Behavior

The widespread deployment of LLMs like GPT-3.5 and GPT-4 necessitates an understanding of their strengths and limitations. It is posited that to truly grasp the capabilities of LLMs, one must consider the problem these models have been trained to solve: predicting the next word in a sequence, using Internet text as a substrate. Recognizing this training goal—the essence of their autoregressive nature—and the environment of their operation leads to insights about their performance.

Factors Influencing LLM Performance

Research presents a "teleological" approach, prioritizing the goals and environment that shape LLMs. This perspective presupposes LLM accuracy is influenced by:

  • Task probability: LLMs excel at tasks reflecting high-frequency examples in training data.
  • Output probability: Deterministic tasks notwithstanding, models lean towards higher accuracy for more probable outputs.
  • Input probability: Effectiveness may be impacted by the provided input's likelihood, although less than output probability.

Empirical Validation

Evaluations encompass eleven distinct tasks, revealing three key influences:

  1. LLM accuracy aligns with task frequency; common tasks bring greater success than their rare counterparts.
  2. Even when tasks don't rely on it, the probability of target outputs can unexpectedly dictate LLM performance.
  3. While input probability partially shapes LLM behavior, it's overshadowed by the decisive sway of output probability.

What stands out is an asymmetry; models are more affected by the likelihood of what they generate (outputs) than by the likelihood of the information they receive (inputs).

Beyond Probability: Other Characteristic Phenomena

  • Lack of Embodiment: LLMs may fumble tasks easily solved by humans using physical interaction, e.g., applying a keyboard-based cipher.
  • Sensitivity to Wording: The exact phrasing, even for similar ideas, can elicit divergent LLM responses, revealing a heavy reliance on language patterns.

Implications for LLM Application

The work advises caution when employing LLMs for rare tasks (due to probability biases) and situations requiring low-probability text generation. Advanced prompting strategies and scaling might uplift model performance, but fundamental tendencies persist, stressing the need for an approach informed by the intrinsic training nature of LLMs.

Closing Thoughts

As LLMs continue to advance in capability, comprehending their ingrained biases and operational nuances becomes more critical. This paper underscores the importance of aligning LLM evaluations with their foundational training aspects to navigate their capabilities and boundaries accurately.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (179)
  1. Transferring inductive biases through knowledge distillation. arXiv preprint arXiv:2006.00555.
  2. Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models. arXiv preprint arXiv:2305.13707.
  3. CoLT5: Faster long-range Transformers with conditional computation. arXiv preprint arXiv:2303.09752.
  4. Anderson, J. R. 1990. The Adaptive Character of Thought. Erlbaum.
  5. Arkoudas, K. 2023. GPT-4 Can’t Reason. arXiv preprint arXiv:2308.03762.
  6. A comparative study of cache recovery by three corvid species. Animal Behaviour, 38(3): 486–495.
  7. Can machine learning be secure? In Proceedings of the 2006 ACM Symposium on Information, Computer and Communications Security, 16–25.
  8. Longformer: The Long-Document Transformer. arXiv:2004.05150.
  9. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623.
  10. Climbing towards NLU: On meaning, form, and understanding in the age of data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5185–5198.
  11. A neural probabilistic language model. Advances in Neural Information Processing Systems, 13.
  12. BIG-bench authors. 2023. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models. Transactions on Machine Learning Research.
  13. Using cognitive psychology to understand GPT-3. Proceedings of the National Academy of Sciences, 120(6): e2218523120.
  14. Natural Language Processing with Python. O’Reilly Media Inc.
  15. Experience Grounds Language. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 8718–8735. Online: Association for Computational Linguistics.
  16. Language Contamination Helps Explains the Cross-lingual Capabilities of English Pretrained Models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 3563–3574. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics.
  17. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Advances in Neural Information Processing Systems, 29.
  18. Language Models are Few-Shot Learners. In Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M. F.; and Lin, H., eds., Advances in Neural Information Processing Systems, volume 33, 1877–1901.
  19. Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv preprint arXiv:2303.12712.
  20. Adaptations, exaptations, and spandrels. American Psychologist, 53(5): 533.
  21. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334): 183–186.
  22. Brains and algorithms partially converge in natural language processing. Communications Biology, 5(1): 134.
  23. Data distributional properties drive emergent in-context learning in Transformers. Advances in Neural Information Processing Systems, 35: 18878–18891.
  24. PaLM: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  25. Think you have solved question answering? Try ARC, the AI2 reasoning challenge. arXiv preprint arXiv:1803.05457.
  26. The PASCAL Recognising Textual Entailment Challenge. In Proceedings of the First International Conference on Machine Learning Challenges: Evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment, MLCW’05, 177–190. Berlin, Heidelberg: Springer-Verlag. ISBN 3-540-33427-0, 978-3-540-33427-9.
  27. Language models show human-like content effects on reasoning. arXiv preprint arXiv:2207.07051.
  28. Davis, E. 2023. Mathematics, word problems, common sense, and artificial intelligence. arXiv preprint arXiv:2301.09723.
  29. De Waal, F. 2016. Are we smart enough to know how smart animals are? WW Norton & Company.
  30. Toxicity in ChatGPT: Analyzing persona-assigned language models. arXiv preprint arXiv:2304.05335.
  31. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186. Minneapolis, Minnesota: Association for Computational Linguistics.
  32. Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 7250–7274. Dublin, Ireland: Association for Computational Linguistics.
  33. Faith and Fate: Limits of Transformers on Compositionality. arXiv preprint arXiv:2305.18654.
  34. LMentry: A Language Model Benchmark of Elementary Language Tasks. In Findings of the Association for Computational Linguistics: ACL 2023, 10476–10501. Toronto, Canada: Association for Computational Linguistics.
  35. Contextual effects on word perception and eye movements during reading. Journal of Verbal Learning and Verbal Behavior, 20(6): 641–655.
  36. Measuring Causal Effects of Data Statistics on Language Model’s Factual Predictions. arXiv preprint arXiv:2207.14251.
  37. Elman, J. L. 1990. Finding structure in time. Cognitive Science, 14(2): 179–211.
  38. Elman, J. L. 1991. Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7(2): 195–225.
  39. Ettinger, A. 2020. What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models. Transactions of the Association for Computational Linguistics, 8: 34–48.
  40. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, 1126–1135. PMLR.
  41. Firestone, C. 2020. Performance vs. competence in human–machine comparisons. Proceedings of the National Academy of Sciences, 117(43): 26562–26571.
  42. Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1-2): 3–71.
  43. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1): 1.
  44. The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv preprint arXiv:2101.00027.
  45. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2020, 3356–3369. Online: Association for Computational Linguistics.
  46. A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4): 1360–1383.
  47. Rational integration of noisy evidence and prior semantic expectations in sentence interpretation. Proceedings of the National Academy of Sciences, 110(20): 8051–8056.
  48. Godfrey-Smith, P. 2001. Three kinds of adaptationism. In Orzack, S. H.; and Sober, E., eds., Adaptationism and Optimality, volume 122, 335–357. Cambridge University Press.
  49. Explaining and Harnessing Adversarial Examples. In International Conference on Learning Representations.
  50. The spandrels of San Marco and the Panglossian paradigm: A critique of the adaptationist programme. Proceedings of the Royal Society of London Series B, 205: 581–698.
  51. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes. In International Conference on Learning Representations.
  52. Griffiths, T. L. 2020. Understanding human intelligence through human limitations. Trends in Cognitive Sciences, 24(11): 873–883.
  53. Understanding In-Context Learning via Supportive Pretraining Data. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 12660–12673. Toronto, Canada: Association for Computational Linguistics.
  54. ORCA: Interpreting prompted language models via locating supporting data evidence in the ocean of pretraining data. arXiv preprint arXiv:2205.12600.
  55. Language models are general-purpose interfaces. arXiv preprint arXiv:2206.06336.
  56. Held Jr, L. I. 2009. Quirks of Human Anatomy: An Evo-Devo Look at the Human Body. Cambridge University Press.
  57. Measuring Massive Multitask Language Understanding. In International Conference on Learning Representations.
  58. Rational simplification and rigidity in human planning. Psychological Science.
  59. Long short-term memory. Neural Computation, 9(8): 1735–1780.
  60. Generative Models as a Complex Systems Science: How can we make sense of large language model behavior? arXiv preprint arXiv:2308.00189.
  61. Simultaneous Inference in General Parametric Models. Biometrical Journal, 50(3): 346–363.
  62. BabyBERTa: Learning more grammar with small-scale child-directed language. In Proceedings of the 25th Conference on Computational Natural Language Learning, 624–646.
  63. Compositionality decomposed: How do neural networks generalise? Journal of Artificial Intelligence Research, 67: 757–795.
  64. State-of-the-art generalisation research in NLP: a taxonomy and review. arXiv preprint arXiv:2210.03050.
  65. Evaluation gaps in machine learning practice. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 1859–1876.
  66. Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 5061–5068.
  67. Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221.
  68. Kahn, D. 1967. The Codebreakers. Scribner.
  69. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361.
  70. Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7811–7818. Online: Association for Computational Linguistics.
  71. What do tokens know about their characters and how do they know it? In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2487–2507.
  72. COGS: A Compositional Generalization Challenge Based on Semantic Interpretation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 9087–9105. Online: Association for Computational Linguistics.
  73. Uncontrolled lexical exposure leads to overestimation of compositional generalization in pretrained models. arXiv preprint arXiv:2212.10769.
  74. Entity Tracking in Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 3835–3855. Toronto, Canada: Association for Computational Linguistics.
  75. Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems, 35: 22199–22213.
  76. Comparing Machines and Children: Using Developmental Psychology Experiments to Assess the Strengths and Weaknesses of LaMDA Responses. arXiv preprint arXiv:2305.11243.
  77. Krogman, W. M. 1951. The scars of human evolution. Scientific American, 185(6): 54–57.
  78. Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks. In International Conference on Machine Learning, 2879–2888.
  79. Lampinen, A. K. 2022. Can language models handle recursively nested grammatical structures? A case study on comparing models and humans. arXiv preprint arXiv:2210.15303.
  80. Latimer, B. 2005. The perils of being bipedal. Annals of Biomedical Engineering, 33(1): 3–6.
  81. Neural Text Generation from Structured Data with Application to the Biography Domain. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 1203–1213. Austin, Texas: Association for Computational Linguistics.
  82. Deep learning. Nature, 521(7553): 436–444.
  83. Deduplicating Training Data Makes Language Models Better. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 8424–8445. Dublin, Ireland: Association for Computational Linguistics.
  84. Can you hear me now? Sensitive comparisons of human and machine perception. Cognitive Science, 46(10): e13191.
  85. Levenshtein, V. I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet Physics Doklady, volume 10, 707–710.
  86. Datasets: A Community Library for Natural Language Processing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 175–184. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics.
  87. Implicit Representations of Meaning in Neural Language Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1813–1827. Online: Association for Computational Linguistics.
  88. Holistic evaluation of language models. arXiv preprint arXiv:2211.09110.
  89. Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. Behavioral and brain sciences, 43: e1.
  90. Limitations of Autoregressive Models and Their Alternatives. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 5147–5173. Online: Association for Computational Linguistics.
  91. Linzen, T. 2020. How Can We Accelerate Progress Towards Human-like Linguistic Generalization? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5210–5217. Online: Association for Computational Linguistics.
  92. Troubling trends in machine learning scholarship. Communications of the ACM.
  93. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172.
  94. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
  95. Functional explanation and the function of explanation. Cognition, 99(2): 167–204.
  96. Mechanistic versus functional understanding. Varieties of Understanding: New Perspectives from Philosophy, Psychology, and Theology, 209–229.
  97. Luchins, A. S. 1942. Mechanization in problem solving: The effect of Einstellung. Psychological Monographs, 54(6): i–95.
  98. Dissociating language and thought in large language models: A cognitive perspective. arXiv preprint arXiv:2301.06627.
  99. Malach, E. 2023. Auto-Regressive Next-Token Predictors are Universal Learners. arXiv preprint arXiv:2309.06979.
  100. Marcus, G. F. 1998. Rethinking eliminative connectionism. Cognitive Psychology, 37(3): 243–282.
  101. Marr, D. 1982. Vision. W.H. Freeman.
  102. The Appeal of Parallel Distributed Processing. MIT Press, Cambridge MA, 3–44.
  103. Universal linguistic inductive biases via meta-learning. Proceedings of the 42nd Annual Conference of the Cognitive Science Society, 737–743.
  104. Modeling rapid language learning by distilling Bayesian priors into artificial neural networks. arXiv preprint arXiv:2305.14701.
  105. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3428–3448. Florence, Italy: Association for Computational Linguistics.
  106. How Much Do Language Models Copy From Their Training Data? Evaluating Linguistic Novelty in Text Generation Using RAVEN. Transactions of the Association for Computational Linguistics, 11: 652–670.
  107. Inverse Scaling: When Bigger Isn’t Better. arXiv preprint arXiv:2306.09479.
  108. Pointer sentinel mixture models. In International Conference on Learning Representations.
  109. Entailment Semantics Can Be Extracted from an Ideal Language Model. In Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL), 176–193. Abu Dhabi, United Arab Emirates (Hybrid): Association for Computational Linguistics.
  110. Reducing Conversational Agents’ Overconfidence Through Linguistic Calibration. Transactions of the Association for Computational Linguistics, 10: 857–872.
  111. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 11048–11064. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics.
  112. Mitchell, M. 2023. How do we know how smart AI systems are? Science, 381(6654): adj5957.
  113. The debate over understanding in AI’s large language models. Proceedings of the National Academy of Sciences, 120(13): e2215907120.
  114. The vector grounding problem. arXiv preprint arXiv:2304.01481.
  115. Transformers Can Do Bayesian Inference. In International Conference on Learning Representations.
  116. Probing Neural Network Comprehension of Natural Language Arguments. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 4658–4664. Florence, Italy: Association for Computational Linguistics.
  117. Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114.
  118. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774.
  119. Shaking the foundations: Delusions in sequence models for interaction and control. arXiv preprint arXiv:2110.10819.
  120. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35: 27730–27744.
  121. Meaning without reference in large language models. In NeurIPS 2022 Workshop on Neuro Causal and Symbolic AI (nCSI).
  122. The ROOTS Search Tool: Data Transparency for LLMs. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 304–314. Toronto, Canada: Association for Computational Linguistics.
  123. On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28(1-2): 73–193.
  124. The ancestral shape hypothesis: an evolutionary explanation for the occurrence of intervertebral disc herniation in humans. BMC Evolutionary Biology, 15: 1–10.
  125. On the genesis of abstract ideas. Journal of Experimental Psychology, 77: 353.
  126. Empirical assessment of stimulus poverty arguments. The Linguistic Review, 19(1-2): 9–50.
  127. R Core Team. 2022. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  128. Improving language understanding by generative pre-training.
  129. Language models are unsupervised multitask learners. OpenAI Blog.
  130. Scaling Language Models: Methods, Analysis & Insights from Training Gopher. arXiv.
  131. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. The Journal of Machine Learning Research, 21(1): 5485–5551.
  132. AI and the everything in the whole wide world benchmark. Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
  133. Impact of pretraining term frequencies on few-shot numerical reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2022, 840–854.
  134. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 4902–4912. Online: Association for Computational Linguistics.
  135. Evaluation Examples are not Equally Informative: How should that change NLP Leaderboards? In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 4486–4503. Online: Association for Computational Linguistics.
  136. Rogers, A. 2021. Changing the World by Changing the Data. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2182–2194. Online: Association for Computational Linguistics.
  137. On learning the past tenses of English verbs. Parallel Distributed Processing: Explorations in the Microstructure of Cognition.
  138. Robust Open-Vocabulary Translation from Visual Text Representations. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 7235–7252. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics.
  139. Enhancing the Transformer with explicit relational encoding for math problem solving. In NeurIPS Workshop on Context and Composition in Biological and Artificial Neural Systems.
  140. The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45): e2105646118.
  141. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1715–1725. Association for Computational Linguistics.
  142. Shanahan, M. 2022. Talking about large language models. arXiv preprint arXiv:2212.03551.
  143. Shepard, R. N. 1987. Toward a universal law of generalization for psychological science. Science, 237(4820): 1317–1323.
  144. Shettleworth, S. J. 2010. Cognition, Evolution, and Behavior. Oxford University Press.
  145. Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning, 31210–31227. PMLR.
  146. Language models are multilingual chain-of-thought reasoners. In The Eleventh International Conference on Learning Representations.
  147. Singh, S. 1999. The Code Book. Doubleday.
  148. The effect of word predictability on reading time is logarithmic. Cognition, 128(3): 302–319.
  149. Smolensky, P. 1988. On the proper treatment of connectionism. Behavioral and Brain Sciences, 11(1): 1–23.
  150. Smolensky, P. 1990. Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence, 46(1-2): 159–216.
  151. Harmony optimization and the computational architecture of the mind/brain. The Harmonic Mind, 1: 1–61.
  152. Neurocompositional computing: From the Central Paradox of Cognition to a new generation of AI systems. AI Magazine, 43(3): 308–322.
  153. Language and number: A bilingual training study. Cognition, 78(1): 45–88.
  154. Suetonius. 121/1883. The Twelve Caesars. R. Worthington. Translated by Alexander Thomson.
  155. Intriguing properties of neural networks. In International Conference on Learning Representations.
  156. Elastic net regularization paths for all generalized linear models. Journal of statistical software, 106.
  157. BERT Rediscovers the Classical NLP Pipeline. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 4593–4601. Florence, Italy: Association for Computational Linguistics.
  158. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288.
  159. Turing, A. M. 1950. Computing machinery and intelligence. Mind, 433–460.
  160. Attention is all you need. In Advances in Neural Information Processing Systems, 5998–6008.
  161. Underdetermination in language games: Survey & analysis of Pig Latin dialects. In 77th Annual Meeting of the LSA, Atlanta, 2–5.
  162. Wandell, B. A. 1995. Foundations of Vision. Sinauer Associates, Inc.
  163. Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 4465–4476. Florence, Italy: Association for Computational Linguistics.
  164. SuperGLUE: A stickier benchmark for general-purpose language understanding systems. Advances in Neural Information Processing Systems, 32.
  165. Language models are open knowledge graphs. arXiv preprint arXiv:2010.11967.
  166. Adversarial Policies Beat Superhuman Go AIs. arXiv:2211.00241.
  167. Emergent Symbols through Binding in External Memory. In International Conference on Learning Representations.
  168. Do Prompt-Based Models Really Understand the Meaning of Their Prompts? In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2300–2344. Seattle, United States: Association for Computational Linguistics.
  169. Frequency Effects on Syntactic Rule Learning in Transformers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 932–948. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics.
  170. Emergent Abilities of Large Language Models. Transactions on Machine Learning Research. Survey Certification.
  171. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35: 24824–24837.
  172. Wickham, H. 2016. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4.
  173. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 1112–1122. New Orleans, Louisiana: Association for Computational Linguistics.
  174. Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks. arXiv preprint arXiv:2307.02477.
  175. Adversarial attacks and defenses in images, graphs and text: A review. International Journal of Automation and Computing, 17: 151–178.
  176. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv:2305.10601.
  177. Imitation versus Innovation: What children can do that large language and language-and-vision models cannot (yet)? arXiv preprint arXiv:2305.07666.
  178. Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, 12697–12706. PMLR.
  179. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. In The Eleventh International Conference on Learning Representations.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. R. Thomas McCoy (33 papers)
  2. Shunyu Yao (72 papers)
  3. Dan Friedman (16 papers)
  4. Matthew Hardy (1 paper)
  5. Thomas L. Griffiths (150 papers)
Citations (114)
Youtube Logo Streamline Icon: https://streamlinehq.com