Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale (2212.09095v2)

Published 18 Dec 2022 in cs.CL and cs.AI

Abstract: LLMs have been shown to perform better with an increase in scale on a wide variety of tasks via the in-context learning paradigm. In this paper, we investigate the hypothesis that the ability of a LLM to in-context learn-perform a task is not uniformly spread across all of its underlying components. Using a 66 billion parameter LLM (OPT-66B) across a diverse set of 14 downstream tasks, we find this is indeed the case: $\sim$70% of attention heads and $\sim$20% of feed forward networks can be removed with minimal decline in task performance. We find substantial overlap in the set of attention heads (un)important for in-context learning across tasks and number of in-context examples. We also address our hypothesis through a task-agnostic lens, finding that a small set of attention heads in OPT-66B score highly on their ability to perform primitive induction operations associated with in-context learning, namely, prefix matching and copying. These induction heads overlap with task-specific important heads, reinforcing arguments by Olsson et al. (arXiv:2209.11895) regarding induction head generality to more sophisticated behaviors associated with in-context learning. Overall, our study provides several insights that indicate LLMs may be under-trained for in-context learning and opens up questions on how to pre-train LLMs to more effectively perform in-context learning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Hritik Bansal (38 papers)
  2. Karthik Gopalakrishnan (34 papers)
  3. Saket Dingliwal (22 papers)
  4. Sravan Bodapati (31 papers)
  5. Katrin Kirchhoff (36 papers)
  6. Dan Roth (222 papers)
Citations (38)
Youtube Logo Streamline Icon: https://streamlinehq.com