Exploring the Impact of Semantic Priors and Input-Label Mappings on In-Context Learning Across Model Scales
Overview of In-Context Learning Variations
The paper analyzes how larger LLMs adapt to in-context learning (ICL) utilizing semantic priors versus input-label mappings. Experiments engage with two distinctive ICL setups: Flipped-Label ICL and Semantically-Unrelated Label ICL (SUL-ICL), across an assortment of model families (GPT-3, InstructGPT, Codex, PaLM, and Flan-PaLM). The key inquiry revolves around whether the models rely more on semantic priors internalized during pretraining or if they can learn to map inputs to labels directly from the presented exemplars.
Findings on Semantic Priors Override
Investigation into Flipped-Label ICL highlights an emergent capability among large models to override ingrained semantic priors. This phenomenon manifests when models encounter flipped labels in the context exemplars—they adapt their outputs in accordance with new mappings, a capability that amplified with model scale. Contrastingly, smaller models exhibited resilience to change, adhering to their pretraining-informed semantic priors.
Semantically-Unrelated Label ICL Insights
In the field of SUL-ICL, where models confront labels devoid of semantic connection to the inputs, it was discerned that large models withstand the absence of semantic priors more efficiently than smaller counterparts. This capability is indicative of large models' adeptness in forging new input-label mappings absent any reliance on pretraining-induced semantic understanding. The paper unveils that for certain tasks, exceeding random guessing accuracy in the SUL-ICL setting necessitates substantial model scaling, pointing towards an emergent property tied to model size.
Instruction Tuning and ICL
Instruction-tuned models exhibit enhanced performance in SUL-ICL setups, suggesting an increased propensity for learning input-label mappings from exemplars. However, such models demonstrated a decreased ability to disregard or override semantic priors in the Flipped-Label ICL tests. This underscores a duality where instruction tuning amplifies both the grasp on semantic priors and the capability to learn new mappings, with a stronger inclination towards utilizing semantic priors.
Implications on High-Dimensional Linear Classification
The paper extends its examination to high-dimensional linear classification tasks, where it emerges that successful task execution without semantic priors becomes feasible at certain model scales. This observation underscores the breadth of in-context learning's applicability, transcending NLP tasks to include abstract, symbolic reasoning challenges.
Concluding Remarks
The exploration delineates the nuanced dynamics between semantic priors utilization and the learning of novel input-label mappings within the framework of in-context learning across different model scales. The revelation of emergent phenomena—as models upscale—underscores a critical leap toward more versatile and adaptable LLMs. It accentuates the imperative for ongoing research to unravel the evolving capabilities of LLMs as they scale, paving the path towards AI systems with more profound syntactic and conceptual understanding, capable of transcending traditional reliance on pretraining-imparted knowledge.