Emergent Abilities of Large Language Models (2206.07682v2)

Published 15 Jun 2022 in cs.CL

Abstract: Scaling up LLMs has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of LLMs. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of LLMs.

Citations (2,022)

View on Semantic Scholar

Summary

The paper demonstrates that emergent abilities appear in LLMs only above a specific scale threshold, shifting performance from random to qualitatively superior.
It employs few-shot, chain-of-thought, and augmented prompting strategies to showcase sudden improvements in tasks such as arithmetic reasoning and MMLU benchmarks.
The findings suggest that future LLM designs may unlock unforeseen competencies, emphasizing further exploration of model scale, architecture, and data quality.

Emergent Abilities of LLMs

Introduction

The paper "Emergent Abilities of LLMs" explores the phenomenon of emergent abilities in LLMs, which are abilities that appear at specific scales and cannot be linearly extrapolated from the performance of smaller models. This challenges the traditional paradigms that suggest a predictable improvement in model performance with scale and opens intriguing questions about the potential capabilities of future models.

Emergent Abilities Definition

Emergent abilities are defined as those capabilities that are absent in smaller models but significantly manifest in larger models. They exhibit a phase transition behavior, wherein performance remains near-random until a critical model scale threshold is reached, after which a sudden qualitative improvement occurs. This is distinctly different from gradual scaling effects, marking these abilities as unpredictable outcomes simply through scaling laws.

Few-Shot Prompted Tasks

Emergent abilities are prominently observed in few-shot prompting scenarios. For various LLMs such as GPT-3, Chinchilla, and PaLM, tasks like arithmetic reasoning and complex language understanding show significant improvements in few-shot configurations only at specific model scales. These tasks remain at performance levels indistinguishable from random chance for smaller models, rising abruptly at larger scales. The paper highlights MMLU benchmark performance, arithmetic problem-solving, and WIQA—tasks for which model performance is suddenly unlocked despite not being part of the training data.

Augmented Prompting Strategies

Aside from few-shot prompting, emergent abilities are observed with advanced prompting strategies, such as chain-of-thought prompting for complex reasoning tasks, instruction tuning for handling diverse new instructions, and scratchpad training for computational tasks. These strategies exhibit improvements only at certain model scales, revealing that enhancements in LLMs are not merely a factor of scale but are also dependent on innovative methods for interacting with these models.

Discussion

The unpredictability of emergent abilities raises questions about the potential capabilities unlocked by further scaling. While certain tasks exhibit sudden performance enhancements—implying latent capacities in pre-trained models—it is unclear which future models might display new abilities or at what scales. This unpredictable nature suggests that LLMs might acquire unforeseen competencies, emphasizing the value in continuing to explore the boundaries of scale in LLMs.

Potential Explanations of Emergence

Understanding emergence requires exploring why abilities only become active beyond certain scales. Theoretical considerations like required computational steps, data memorization, and architectural limits may offer insight. For LLMs, emergent capabilities might relate to the architecture's depth, parameter count, or the quality of training data. Evaluation metrics can also obscure incremental improvements by not accounting for partial solutions, highlighting the need for diverse evaluative measures to better predict emergent abilities.

Conclusion

The paper concludes that emergent abilities signal a shift in how AI researchers and developers should approach LLM design and scaling. Recognizing the unpredictability and potential for future emergent skills in LLMs, researchers are encouraged to investigate intrinsic capabilities unlocked through further scaling, alongside improvements in architecture, data quality, and strategic task framing. Understanding these capabilities can pave the way for more robust and broadly capable LLMs.