Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distilling Mathematical Reasoning Capabilities into Small Language Models (2401.11864v5)

Published 22 Jan 2024 in cs.CL and cs.AI

Abstract: This work addresses the challenge of democratizing advanced LLMs by compressing their mathematical reasoning capabilities into sub-billion parameter Small LLMs (SLMs) without compromising performance. We introduce Equation-of-Thought Distillation (EoTD), a novel technique that encapsulates the reasoning process into equation-based representations to construct an EoTD dataset for fine-tuning SLMs. Additionally, we propose the Ensemble Thoughts Distillation (ETD) framework to enhance the reasoning performance of SLMs. This involves creating a reasoning dataset with multiple thought processes, including Chain-of-Thought (CoT), Program-of-Thought (PoT), and Equation-of-Thought (EoT), and using it for fine-tuning. Our experimental performance demonstrates that EoTD significantly boosts the reasoning abilities of SLMs, while ETD enables these models to achieve state-of-the-art reasoning performance.

Introduction

The paper introduces innovative approaches to refining Small LLMs (SLMs) with the goal of democratizing advanced LLMs. The authors propose Equation-of-Thought Distillation (EoTD) and Mix Thoughts Distillation (MTD), specifically designed to enhance mathematical reasoning in SLMs. These approaches represent a leap forward in the field of NLP, granting SLMs sophisticated capabilities previously held by computationally intense LLMs, which are often impractical for widespread usage due to their enormity.

Mathematical Reasoning

Mathematical reasoning remains an Achilles' heel for most AI systems. Recent studies acknowledge the effectiveness of Chain-of-Thought (CoT) in coaxing LLMs to provide intermediate steps for complex problems. Nonetheless, SLMs grapple to match this proficiency due to their inherent size limitations, which underscore the need for efficient distillation techniques. The authors' EoTD framework infuses mathematical reasoning into SLMs by translating problems into solvable equations without the high computational cost usually accompanied by LLMs.

Knowledge Distillation and Methodology

Knowledge Distillation, a process of transferring expertise from massive LLMs to more manageable SLMs, underpins the authors' proposed frameworks. EoTD differentiates itself by offloading calculations to an external solver, thus reducing errors and simplifying problem-solving for SLMs. To further amplify reasoning capabilities, the paper introduces MTD—a synthesis of various reasoning datasets that imbues SLMs with a wider range of reasoning strategies. MTD's diversified dataset enriches the SLMs' reasoning knowledge base, significantly sharpening their mathematical problem-solving skills.

Experimental Findings

The research rigorously evaluates EoTD and MTD across multiple SLM variants on diverse mathematical reasoning datasets. The results are compelling: EoTD markedly improves SLM reasoning, exemplified by up to an 18.87% increase in accuracy on certain datasets. Meanwhile, MTD demonstrates an even more impressive improvement, achieving up to 42.45% accuracy, indicating a 20% edge over EoTD. The paper also confirms the hypothesis that a greater volume of heterogeneous reasoning paths correlates with enhanced SLM reasoning performance.

Conclusion

These frameworks mark a paradigm shift in harnessing the vast problem-solving prowess of LLMs within a pragmatic, significantly less resource-intensive format. The paper's findings directly contribute to the objective of democratizing AI, presenting a compelling case for SLMs capable of advanced reasoning tasks within constrained computational environments. The inherent scalability of these methods suggests a bright future for SLMs, making advanced NLP technologies more universally adoptable.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xunyu Zhu (7 papers)
  2. Jian Li (667 papers)
  3. Yong Liu (721 papers)
  4. Can Ma (21 papers)
  5. Weiping Wang (123 papers)
Citations (5)
X Twitter Logo Streamline Icon: https://streamlinehq.com