Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Larger language models do in-context learning differently (2303.03846v2)

Published 7 Mar 2023 in cs.CL

Abstract: We study how in-context learning (ICL) in LLMs is affected by semantic priors versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with semantically-unrelated labels-across various model families (GPT-3, InstructGPT, Codex, PaLM, and Flan-PaLM). First, experiments on ICL with flipped labels show that overriding semantic priors is an emergent ability of model scale. While small LLMs ignore flipped labels presented in-context and thus rely primarily on semantic priors from pretraining, large models can override semantic priors when presented with in-context exemplars that contradict priors, despite the stronger semantic priors that larger models may hold. We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e.g., foo/bar instead of negative/positive), thereby forcing LLMs to learn the input-label mappings shown in in-context exemplars in order to perform the task. The ability to do SUL-ICL also emerges primarily with scale, and large-enough LLMs can even perform linear classification in a SUL-ICL setting. Finally, we evaluate instruction-tuned models and find that instruction tuning strengthens both the use of semantic priors and the capacity to learn input-label mappings, but more of the former.

Exploring the Impact of Semantic Priors and Input-Label Mappings on In-Context Learning Across Model Scales

Overview of In-Context Learning Variations

The paper analyzes how larger LLMs adapt to in-context learning (ICL) utilizing semantic priors versus input-label mappings. Experiments engage with two distinctive ICL setups: Flipped-Label ICL and Semantically-Unrelated Label ICL (SUL-ICL), across an assortment of model families (GPT-3, InstructGPT, Codex, PaLM, and Flan-PaLM). The key inquiry revolves around whether the models rely more on semantic priors internalized during pretraining or if they can learn to map inputs to labels directly from the presented exemplars.

Findings on Semantic Priors Override

Investigation into Flipped-Label ICL highlights an emergent capability among large models to override ingrained semantic priors. This phenomenon manifests when models encounter flipped labels in the context exemplars—they adapt their outputs in accordance with new mappings, a capability that amplified with model scale. Contrastingly, smaller models exhibited resilience to change, adhering to their pretraining-informed semantic priors.

Semantically-Unrelated Label ICL Insights

In the field of SUL-ICL, where models confront labels devoid of semantic connection to the inputs, it was discerned that large models withstand the absence of semantic priors more efficiently than smaller counterparts. This capability is indicative of large models' adeptness in forging new input-label mappings absent any reliance on pretraining-induced semantic understanding. The paper unveils that for certain tasks, exceeding random guessing accuracy in the SUL-ICL setting necessitates substantial model scaling, pointing towards an emergent property tied to model size.

Instruction Tuning and ICL

Instruction-tuned models exhibit enhanced performance in SUL-ICL setups, suggesting an increased propensity for learning input-label mappings from exemplars. However, such models demonstrated a decreased ability to disregard or override semantic priors in the Flipped-Label ICL tests. This underscores a duality where instruction tuning amplifies both the grasp on semantic priors and the capability to learn new mappings, with a stronger inclination towards utilizing semantic priors.

Implications on High-Dimensional Linear Classification

The paper extends its examination to high-dimensional linear classification tasks, where it emerges that successful task execution without semantic priors becomes feasible at certain model scales. This observation underscores the breadth of in-context learning's applicability, transcending NLP tasks to include abstract, symbolic reasoning challenges.

Concluding Remarks

The exploration delineates the nuanced dynamics between semantic priors utilization and the learning of novel input-label mappings within the framework of in-context learning across different model scales. The revelation of emergent phenomena—as models upscale—underscores a critical leap toward more versatile and adaptable LLMs. It accentuates the imperative for ongoing research to unravel the evolving capabilities of LLMs as they scale, paving the path towards AI systems with more profound syntactic and conceptual understanding, capable of transcending traditional reliance on pretraining-imparted knowledge.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Jerry Wei (16 papers)
  2. Jason Wei (49 papers)
  3. Yi Tay (94 papers)
  4. Dustin Tran (54 papers)
  5. Albert Webson (19 papers)
  6. Yifeng Lu (16 papers)
  7. Xinyun Chen (80 papers)
  8. Hanxiao Liu (35 papers)
  9. Da Huang (67 papers)
  10. Denny Zhou (65 papers)
  11. Tengyu Ma (117 papers)
Citations (309)
Youtube Logo Streamline Icon: https://streamlinehq.com