Introduction
The paper introduces innovative approaches to refining Small LLMs (SLMs) with the goal of democratizing advanced LLMs. The authors propose Equation-of-Thought Distillation (EoTD) and Mix Thoughts Distillation (MTD), specifically designed to enhance mathematical reasoning in SLMs. These approaches represent a leap forward in the field of NLP, granting SLMs sophisticated capabilities previously held by computationally intense LLMs, which are often impractical for widespread usage due to their enormity.
Mathematical Reasoning
Mathematical reasoning remains an Achilles' heel for most AI systems. Recent studies acknowledge the effectiveness of Chain-of-Thought (CoT) in coaxing LLMs to provide intermediate steps for complex problems. Nonetheless, SLMs grapple to match this proficiency due to their inherent size limitations, which underscore the need for efficient distillation techniques. The authors' EoTD framework infuses mathematical reasoning into SLMs by translating problems into solvable equations without the high computational cost usually accompanied by LLMs.
Knowledge Distillation and Methodology
Knowledge Distillation, a process of transferring expertise from massive LLMs to more manageable SLMs, underpins the authors' proposed frameworks. EoTD differentiates itself by offloading calculations to an external solver, thus reducing errors and simplifying problem-solving for SLMs. To further amplify reasoning capabilities, the paper introduces MTD—a synthesis of various reasoning datasets that imbues SLMs with a wider range of reasoning strategies. MTD's diversified dataset enriches the SLMs' reasoning knowledge base, significantly sharpening their mathematical problem-solving skills.
Experimental Findings
The research rigorously evaluates EoTD and MTD across multiple SLM variants on diverse mathematical reasoning datasets. The results are compelling: EoTD markedly improves SLM reasoning, exemplified by up to an 18.87% increase in accuracy on certain datasets. Meanwhile, MTD demonstrates an even more impressive improvement, achieving up to 42.45% accuracy, indicating a 20% edge over EoTD. The paper also confirms the hypothesis that a greater volume of heterogeneous reasoning paths correlates with enhanced SLM reasoning performance.
Conclusion
These frameworks mark a paradigm shift in harnessing the vast problem-solving prowess of LLMs within a pragmatic, significantly less resource-intensive format. The paper's findings directly contribute to the objective of democratizing AI, presenting a compelling case for SLMs capable of advanced reasoning tasks within constrained computational environments. The inherent scalability of these methods suggests a bright future for SLMs, making advanced NLP technologies more universally adoptable.