Papers

Topics

Authors

Recent

View all

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 75 tok/s

Gemini 2.5 Pro 46 tok/s Pro

GPT-5 Medium 26 tok/s Pro

GPT-5 High 27 tok/s Pro

GPT-4o 104 tok/s Pro

Kimi K2 170 tok/s Pro

GPT OSS 120B 468 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning (2310.04680v1)

Published 7 Oct 2023 in cs.CL, cs.AI, and cs.LG

Abstract: How does scaling the number of parameters in LLMs affect their core capabilities? We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference. By curating a suite of tasks that help disentangle these two capabilities, we find a striking difference in how these two abilities evolve due to scaling. Reducing the model size by more than 30\% (via either scaling approach) significantly decreases the ability to recall facts seen in pre-training. Yet, a 60--70\% reduction largely preserves the various ways the model can process in-context information, ranging from retrieving answers from a long context to learning parameterized functions from in-context exemplars. The fact that both dense scaling and weight pruning exhibit this behavior suggests that scaling model size has an inherently disparate effect on fact recall and in-context learning.

Citations (3)

View on Semantic Scholar

Summary

The paper reveals that reducing language model size disproportionately affects capabilities, with fact recall deteriorating significantly more than in-context learning.
Fact recall degrades rapidly with over 30% model size reduction, while in-context learning capabilities persist even with 60-70% size reduction.
These findings suggest optimizing model efficiency by leveraging robust in-context learning and augmenting fact recall in smaller models with external sources.

Analyzing the Impact of Model Size Scaling on LLM Capabilities

The paper "The Cost of Down-Scaling LLMs: Fact Recall Deteriorates before In-Context Learning" provides a detailed paper on how scaling the number of parameters in LLMs affects their capability to recall facts and perform in-context learning (ICL). It examines two scaling techniques: weight pruning and dense scaling, shedding light on distinct impacts on two core capabilities of LLMs. The findings indicate a significant disparity in how these abilities evolve under parameter scaling, which has practical implications for the efficient deployment and interpretability of LLMs.

Key Insights

The authors present two natural scaling methods:

Weight Pruning: This entails removing parameters from an existing model while attempting to maintain performance.
Dense Scaling: This involves training a model with either smaller or larger dimensions.

The paper evaluates these scaling approaches using tasks designed to separate the effects on fact recall (retrieving data embedded in the model during pre-training) from in-context learning (processing information given during inference). The primary observations include:

Fact Recall: A reduction in model size by more than 30% significantly diminishes the ability to recall pre-trained facts.
In-Context Learning: Capabilities related to ICL remain largely intact even when model size is reduced by 60-70%.

These results are consistent for both pruning and dense scaling, suggesting a fundamentally different impact of model size reduction on these two abilities.

Implications and Future Directions

The findings indicate a clear separation in the parameter reliance for different capabilities. While fact recall is sensitive to parameter reduction, ICL shows robustness. This observation has several significant implications:

Inference Efficiency: By recognizing tasks predominantly dependent on ICL, more efficient models can be deployed without substantial performance loss.
Memory Augmentation: The findings advocate for the integration of external information sources to complement fact recall in smaller models, thus enhancing efficiency and accuracy.
Model Interpretability: The observation that a relatively small portion of parameters supports ICL suggests potential for improving model interpretability through targeted pruning.

The research complicates the current understanding of LLM scalability's impact on subsystems within a model, calling for models to be more intelligently segmented based on their task requirements. Further investigation into the theoretical foundations of these observations could substantially aid model design and optimization.

Technical Evaluation

The paper employs the SparseGPT pruning algorithm and evaluates multiple models from the OPT, LLaMA, and Pythia families, across six downstream tasks tailored to assess fact recall and ICL capabilities. The curated set of tasks interprets effects of both pruning and dense scaling, generalizing the results across distinct architectures and sizes.

The controlled experiments illustrate the contrasting sensitivity of these capabilities to parameter scaling, placing fact recall at a disadvantage in scenarios necessitating model down-scaling without external context augmentation.

Conclusion

This research contributes to a nuanced understanding of model scaling, potentially guiding future architectural decisions to separate and preserve core model capabilities. It underscores the utility of external context in sustaining task performance with streamlined models and opens avenues for further exploration in model compression strategies. This work distinctly delineates how varying scaling methods differentially influence fact recall versus ICL, crucially shaping the deployment of efficient, capable LLMs in practical applications.