A Theory of Emergent In-Context Learning as Implicit Structure Induction (2303.07971v1)

Published 14 Mar 2023 in cs.CL and cs.LG

Abstract: Scaling LLMs leads to an emergent capacity to learn in-context from example demonstrations. Despite progress, theoretical understanding of this phenomenon remains limited. We argue that in-context learning relies on recombination of compositional operations found in natural language data. We derive an information-theoretic bound showing how in-context learning abilities arise from generic next-token prediction when the pretraining distribution has sufficient amounts of compositional structure, under linguistically motivated assumptions. A second bound provides a theoretical justification for the empirical success of prompting LLMs to output intermediate steps towards an answer. To validate theoretical predictions, we introduce a controlled setup for inducing in-context learning; unlike previous approaches, it accounts for the compositional nature of language. Trained transformers can perform in-context learning for a range of tasks, in a manner consistent with the theoretical results. Mirroring real-world LLMs in a miniature setup, in-context learning emerges when scaling parameters and data, and models perform better when prompted to output intermediate steps. Probing shows that in-context learning is supported by a representation of the input's compositional structure. Taken together, these results provide a step towards theoretical understanding of emergent behavior in LLMs.

Citations (64)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/JagersbergKnut/status/1805539648199954868

https://twitter.com/scitechtalk/status/1744044388513194178

https://twitter.com/somakaditya/status/1788437635637080146

https://twitter.com/rinatie_ceo/status/1927238493299675245

A Theory of Emergent In-Context Learning as Implicit Structure Induction (2303.07971v1)

Summary

Related Papers

Tweets