Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TinyStories: How Small Can Language Models Be and Still Speak Coherent English? (2305.07759v2)

Published 12 May 2023 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs (LMs) are powerful tools for natural language processing, but they often struggle to produce coherent and fluent text when they are small. Models with around 125M parameters such as GPT-Neo (small) or GPT-2 (small) can rarely generate coherent and consistent English text beyond a few words even after extensive training. This raises the question of whether the emergence of the ability to produce coherent English text only occurs at larger scales (with hundreds of millions of parameters or more) and complex architectures (with many layers of global attention). In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We show that TinyStories can be used to train and evaluate LMs that are much smaller than the state-of-the-art models (below 10 million total parameters), or have much simpler architectures (with only one transformer block), yet still produce fluent and consistent stories with several paragraphs that are diverse and have almost perfect grammar, and demonstrate reasoning capabilities. We also introduce a new paradigm for the evaluation of LLMs: We suggest a framework which uses GPT-4 to grade the content generated by these models as if those were stories written by students and graded by a (human) teacher. This new paradigm overcomes the flaws of standard benchmarks which often requires the model's output to be very structures, and moreover provides a multidimensional score for the model, providing scores for different capabilities such as grammar, creativity and consistency. We hope that TinyStories can facilitate the development, analysis and research of LMs, especially for low-resource or specialized domains, and shed light on the emergence of language capabilities in LMs.

Exploring the Limits of LLM Size for Coherent Text Generation

The paper, How Small Can LLMs Be and Still Speak Coherent English?, by Ronen Eldan and Yuanzhi Li, examines a critical question in NLP: can small LLMs (SLMs) generate coherent and fluent text, or are large models with complex architectures indispensable? By introducing a synthetic dataset named TinyStories, the paper addresses the challenges of scalability, coherence, and reasoning in LLMs with significantly fewer parameters.

Key Contributions

  1. TinyStories Dataset: The authors present a novel synthetic dataset featuring short stories composed of words typically understood by young children, created via GPT-3.5 and GPT-4. The dataset is designed to capture essential elements of language—grammar, vocabulary, and basic reasoning—while maintaining a reduced breadth and diversity. This dataset enables the training of models with less than 10 million parameters, yet still achieving coherent text generation.
  2. Scaling and Evaluation: SLMs were evaluated using a new paradigm involving GPT-4 as a grader, a departure from traditional benchmarks. This method provides a multidimensional analysis of model output, tracking grammar, creativity, and adherence to instructions without necessitating structured responses. The findings reveal that even models trained with limited computational resources can exhibit behaviors typical of larger models, including scaling laws and various trade-offs.
  3. Interpretable Model Behaviors: The paper highlights that smaller models are usually more interpretable. The paper explores attention patterns and neuron activation, demonstrating distinct functions even in minimal architectures—such as handling semantic roles and managing local and global attention.

Results and Implications

  • The paper successfully demonstrates that SLMs, when trained on an appropriately designed dataset like TinyStories, can forge coherent narratives with a diversity that rivals LLMs, contradicting the often assumed requirement of large-scale models for text generation.
  • By exploring models with a single transformer block, the research offers insights into the architectural and functional demands of NLP tasks, suggesting that significant contextual and syntactic understanding can emerge from minimalistic designs.

Future Directions

The findings pave the way for numerous advancements in NLP and AI:

  • Specialized and Low-Resource Domains: TinyStories provides a foundational tool for developing models tailored to niche areas, opening paths for practical applications where large datasets are impractical.
  • Dataset Synthesis: By demonstrating the impact of a refined dataset, future research could focus on synthesizing corpora to maximize learning efficiency across diverse applications.
  • Understanding Model Creativity: Although the models exhibit basic reasoning and factual knowledge, exploring the depth of creativity and true understanding in generated content could further refine model utility and versatility.

While the paper primarily tackles the foundational question of model size and coherence, its findings have broader implications for developing efficient, interpretable, and scalable NLP solutions. The work sets a precedent for utilizing synthetic datasets to maximize the capabilities of LLMs with constrained resources.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Ronen Eldan (60 papers)
  2. Yuanzhi Li (119 papers)
Citations (183)
Youtube Logo Streamline Icon: https://streamlinehq.com