Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

38 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

41 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

533 1

Stable LM 2 1.6B Technical Report (2402.17834v1)

Published 27 Feb 2024 in cs.CL and stat.ML

Abstract: We introduce StableLM 2 1.6B, the first in a new generation of our LLM series. In this technical report, we present in detail the data and training procedure leading to the base and instruction-tuned versions of StableLM 2 1.6B. The weights for both models are available via Hugging Face for anyone to download and use. The report contains thorough evaluations of these models, including zero- and few-shot benchmarks, multilingual benchmarks, and the MT benchmark focusing on multi-turn dialogues. At the time of publishing this report, StableLM 2 1.6B was the state-of-the-art open model under 2B parameters by a significant margin. Given its appealing small size, we also provide throughput measurements on a number of edge devices. In addition, we open source several quantized checkpoints and provide their performance metrics compared to the original model.

PDF HTML Abstract

Introducing Stable LM 2 1.6B: A Compact LLM with Multilingual Capabilities and Open Licensing

Overview

The Stable LM 2 1.6B LLM marks a significant advance in the development of compact, efficient, and openly accessible LLMs. As a successor in the Stable LM series, this model sets a new benchmark for performance in models under 2B parameters. Its design and training have been openly documented, with full transparency regarding the datasets used, training procedures, and performance benchmarks across multiple languages and tasks. This ensures reproducibility and fosters further research within the AI community.

Training and Data

Pre-Training

The model underwent extensive pre-training, employing a diverse set of data sources to enhance its linguistic comprehensiveness and versatility. It uses a standard autoregressive training approach with optimizations for sequence-wise parallelism and is trained from scratch using the efficient FlashAttention-2 mechanism. The chosen datasets span academic sources, books, web content, and specific domains like law and math, totaling approximately 2 trillion tokens. Notably, the training set includes multilingual data, ensuring the model's proficiency across languages. Detailed documentation of the training set, including sampling weights and epochs, ensures transparency and reproducibility.

Fine-Tuning

The fine-tuning process employed supervised learning, direct preference optimization, and self-knowledge learning to refine the model's conversational abilities and align it with human preferences. The use of varied conversational datasets and the exclusion of multilingual data at this stage emphasize the model's focus on developing nuanced language capabilities.

Performance Benchmarks

The model demonstrates exemplary performance across multiple benchmarks, including zero-shot, few-shot, and multilingual evaluations. It not only competes with models twice its size but also sets a new standard for similarly sized open-source LLMs. Its robust multilingual capabilities are evidenced by superior performance in non-English languages seen during pre-training. Additionally, its proficiency in conversational contexts is confirmed by outstanding results on the MT-Bench multi-turn benchmark.

Inference and Quantization

A critical focus of Stable LM 2 1.6B is its efficiency and adaptability for on-device execution. The model has been optimized and quantized for performance on edge devices, with quantization files made available for different inference frameworks. This step is crucial for expanding the applicability of advanced generative capabilities to mobile and consumer-grade hardware without substantial computational overhead.

Future Directions

The paper outlines several avenues for further research, including improvements in data quality, hallucination mitigation, extending context lengths, and exploring conditional computation techniques like Mixture of Experts. These areas promise to enhance the model's performance, further reduce computational requirements, or expand its applicability.

Environmental and Societal Considerations

The report transparently discusses the environmental impact of training Stable LM 2, estimating the carbon footprint based on power consumption and GPU hours. Furthermore, the decision to release the model under an open non-commercial license reflects a commitment to accessibility and responsible use, although it also acknowledges the challenges in assessing the broader societal impacts of such open releases.

Conclusion

Stable LM 2 1.6B represents a balance between performance, efficiency, and accessibility, embodying advancements in LLM training and evaluation. By providing a transparent account of its development process and performance benchmarks, the model contributes valuable insights to the AI community. It encourages further innovation in the development of compact, multilingual, and efficient LLMs that are both powerful and accessible for a wide range of applications.

PDF Markdown Bookmark Chat (Pro)

References (88)

Authors (19)

Marco Bellagente (13 papers)
Jonathan Tow (7 papers)
Dakota Mahan (6 papers)
Duy Phung (9 papers)
Maksym Zhuravinskyi (6 papers)
Reshinth Adithyan (4 papers)
James Baicoianu (2 papers)
Ben Brooks (1 paper)
Nathan Cooper (35 papers)
Ashish Datta (2 papers)
Meng Lee (1 paper)
Emad Mostaque (1 paper)
Michael Pieler (10 papers)
Nikhil Pinnaparju (1 paper)
Paulo Rocha (8 papers)
Harry Saini (3 papers)
Hannah Teufel (7 papers)
Carlos Riquelme (26 papers)
Niccolo Zanichelli (1 paper)

Citations (37)

View on Semantic Scholar

Tweets

https://twitter.com/StabilityAI/status/1766142222767214896

https://twitter.com/iScienceLuvr/status/1763052612541153584

https://twitter.com/rikelhood/status/1763174881376489651

https://twitter.com/MarcoBellagente/status/1763206804563984443

https://twitter.com/fly51fly/status/1763312276071948682

https://twitter.com/dmayhem93/status/1763064061107839193

HackerNews

StableLM 1.6B Technical Report – includes all data, training, strategy (1 point, 1 comment)