Small Language Models (SLMs) Can Still Pack a Punch: A survey (2501.05465v1)

Published 3 Jan 2025 in cs.CL

Abstract: As foundation AI models continue to increase in size, an important question arises - is massive scale the only path forward? This survey of about 160 papers presents a family of Small LLMs (SLMs) in the 1 to 8 billion parameter range that demonstrate smaller models can perform as well, or even outperform large models. We explore task agnostic, general purpose SLMs, task-specific SLMs and techniques to create SLMs that can guide the community to build models while balancing performance, efficiency, scalability and cost. Furthermore we define and characterize SLMs' effective sizes, representing increased capability with respect to LLMs.

PDF Abstract

An Insightful Overview of "Small LLMs (SLMs) Can Still Pack a Punch: A Survey"

The paper undertakes a comprehensive survey of Small LLMs (SLMs) in the parameter range of 1 to 8 billion, challenging the prevailing notion that larger models are inherently superior in performance. With references to approximately 160 papers, it meticulously examines different types of SLMs, categorizes them based on size, application, and training methodologies, and unravels the innovative techniques that have propelled their efficiency and capability to either match or surpass their larger LLM counterparts.

Key Themes and Findings

Landscape of Small LLMs: The paper sets forth by identifying various SLMs—both task-agnostic and task-specific—and explores the dimensions in which they exhibit comparable, or in certain instances, superior performance to LLMs. The paper crucially disputes the lack of a universal definition distinguishing SLMs and LLMs, with SLMs clustering in the 1B, 7B, and 13B parameter sizes.
Structural and Architectural Innovations: A broad range of innovative techniques empower SLMs. For instance, the Llama family (notable derivatives being Llama2 and Llama3) and Mistral models vividly demonstrate efficiency coupled with resource optimization through their unique architectural refinements like Grouped Query Attention (GQA) and Rotary Positional Embeddings (RoPE). These adaptations contribute significantly to their competitive effectiveness and resource efficiency.
Data-Efficiency and Quality: The paper highlights data quality over sheer quantity as pivotal, illustrated by models like Phi series utilizing highly curated datasets, yielding pronounced performance against much larger counterparts. The emphasis on utilizing textbook-quality data is underscored, reflecting a nuanced perspective towards scaling laws.
Task-Specific and Domain-Specific Models: The efficacy of task-specific and domain-specific SLMs across diverse fields such as mathematical reasoning, code generation, and legal-text processing is also rigorously explored. Notably, models like WizardMath and Code Llama exhibit domain specialization, outperforming larger models like Llama 2 70B in their respective tasks.
Cost-Efficiency and Deployment: The survey paints a promising picture for the deployment of SLMs in real-world applications where computational resources are constrained. In offering solutions like SmoLLMs and quantization techniques, the paper underscores the advantages SLMs hold in mobile and edge computing contexts.

Implications and Future Directions

Reevaluating Scaling Laws: The paper’s demonstration of SLMs’ high performance advocates for revisiting existing scaling laws. It suggests considering dataset quality as a pivotal factor alongside model size and dataset size, thereby proposing a more sophisticated model performance equation.
Prospects for Edge Computing: SLMs, given their low-resource requirements and efficiency, hold promising prospects for edge computing environments. They represent a low-cost alternative conducive to scalable deployments, particularly in applications requiring real-time processing and decision-making.
Research Community’s Role: There is a call to action for more nuanced benchmarking frameworks that assess models across a wider array of tasks, including safety and ethical considerations. Additionally, exploration into hybrid architectures and methodologies to further optimize SLM performance remains an open frontier.

Conclusion

The survey meticulously acknowledges the diverse potential of Small LLMs, affirming their capacity to rival or even outperform larger foundational models across various applications. By foregrounding techniques for enhancing dataset quality and optimizing model architectures, the paper provides a fresh perspective that could reshape our understanding of efficient model development and deployment. As we advance in the domain of AI, these insights not only inform the development of more innovative and resource-efficient models but also encourage a paradigm shift in how we perceive the relationship between model size and performance.