TinyLlama: An Open-Source Small Language Model (2401.02385v2)

Published 4 Jan 2024 in cs.CL and cs.AI

Abstract: We present TinyLlama, a compact 1.1B LLM pretrained on around 1 trillion tokens for approximately 3 epochs. Building on the architecture and tokenizer of Llama 2, TinyLlama leverages various advances contributed by the open-source community (e.g., FlashAttention and Lit-GPT), achieving better computational efficiency. Despite its relatively small size, TinyLlama demonstrates remarkable performance in a series of downstream tasks. It significantly outperforms existing open-source LLMs with comparable sizes. Our model checkpoints and code are publicly available on GitHub at https://github.com/jzhang38/TinyLlama.

References (38)

Citations (255)

View on Semantic Scholar

Summary

The paper presents a pre-training methodology that combines natural language and code data in a 7:3 ratio with advanced techniques such as FlashAttention.
The paper demonstrates that TinyLlama outperforms similar open-source models in commonsense reasoning and problem-solving tasks.
The paper commits to transparency by releasing code and checkpoints, enabling ongoing community enhancements and further refinements.

Overview of TinyLlama

TinyLlama is a recently developed LLM pre-trained on an extensive collection of about 1 trillion tokens. Despite its size, being a 1.1B parameter model, it shows superior performance when benchmarked against other open-source models of similar scales. The integration of innovations from the open-source community, such as FlashAttention, has ensured its competitive edge in terms of computational efficiency.

Pre-training Methodology

The researchers introduced a pre-training methodology for TinyLlama that combines natural language data from the SlimPajama corpus and code data from Starcoderdata. They used Llama's tokenizer, maintained a natural language to code data ratio of approximately 7:3, and conducted training over about three epochs. The architecture echoed Llama 2, utilizing elements like Rotary Positional Embedding, RMSNorm, SwiGLU for activation, and grouped-query attention to optimize performance. Moreover, advanced techniques such as Fully Sharded Data Parallel and Flash Attention were employed to augment training speed while reducing GPU memory requirements.

Performance Analysis

TinyLlama was evaluated across a variety of commonsense reasoning and problem-solving tasks. Compared to other open-source LLMs like OPT-1.3B and Pythia variants, TinyLlama not only emerged superior in many tasks but also demonstrated quick performance improvements as computational resources were ramped up. Notable strides were made immediately after refinement of the training data, which previously contained an over-insertion of end-of-sequence tokens.

Conclusion and Contributions

Beyond constructing an efficient LLM, the research team has committed to the principles of transparency and accessibility by making TinyLlama's code and checkpoints publicly available. Its exceptional performance, combined with its relatively reduced size, renders it a valuable resource to both researchers and developers. The team expects to leverage its gathered experiences to further refine TinyLlama, expanding its abilities and practical applications, with further details and updated iterations to be chronicled in the future. The undertaking has been possible thanks to significant support from the academic and funding communities.

Related Papers

GitHub

GitHub - jzhang38/TinyLlama: The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens. (7,075 stars)

Tweets

https://twitter.com/rasbt/status/1744736080430268664

https://twitter.com/BrianRoemmele/status/1743148254668456317

https://twitter.com/arankomatsuzaki/status/1743085927092678946

https://twitter.com/mervenoyann/status/1743240203844665701

https://twitter.com/LChoshen/status/1743268595423945112

https://twitter.com/fly51fly/status/1743405577366352367