Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

184 tokens/sec

GPT-4o

7 tokens/sec

Gemini 2.5 Pro Pro

45 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Unleashing LLM Reasoning Capability via Scalable Question Synthesis from Scratch (2410.18693v2)

Published 24 Oct 2024 in cs.CL and cs.AI

Abstract: Improving the mathematical reasoning capabilities of LLMs is critical for advancing artificial intelligence. However, access to extensive, diverse, and high-quality reasoning datasets remains a significant challenge, particularly for the open-source community. In this paper, we propose ScaleQuest, a novel, scalable, and cost-effective data synthesis method that enables the generation of large-scale mathematical reasoning datasets using lightweight 7B-scale models. ScaleQuest introduces a two-stage question-tuning process comprising Question Fine-Tuning (QFT) and Question Preference Optimization (QPO) to unlock the question generation capabilities of problem-solving models. By generating diverse questions from scratch -- without relying on powerful proprietary models or seed data -- we produce a dataset of 1 million problem-solution pairs. Our experiments demonstrate that models trained on our data outperform existing open-source datasets in both in-domain and out-of-domain evaluations. Furthermore, our approach shows continued performance improvement as the volume of training data increases, highlighting its potential for ongoing data scaling. The extensive improvements observed in code reasoning tasks demonstrate the generalization capabilities of our proposed method. Our work provides the open-source community with a practical solution to enhance the mathematical reasoning abilities of LLMs.

References (56)

Citations (1)

View on Semantic Scholar

Summary

The paper introduces ScaleQuest, a scalable and cost-effective method using smaller models to synthesize high-quality mathematical reasoning data, unlike approaches relying on large proprietary models.
Using the ScaleQuest dataset to fine-tune open-source LLMs resulted in substantial performance improvements (29.2-46.4%) on the MATH benchmark, even surpassing proprietary models.
ScaleQuest offers a significant advancement for the open-source AI community by providing a cost-effective way to generate large-scale reasoning datasets, enabling performance gains without vast resources.

Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

The paper "Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch" presents a paper focused on enhancing the reasoning capabilities of LLMs through innovative data synthesis techniques, specifically targeting mathematical reasoning tasks. The authors introduce a novel methodology, termed ScaleQuest, which is aimed at addressing the deficiencies in the availability of high-quality, large-scale reasoning datasets, which are crucial for the effective instruction tuning of LLMs.

Methodology

The central contribution of this research is ScaleQuest, a scalable and cost-efficient method for generating high-quality reasoning data utilizing smaller and open-source models. Unlike other techniques that depend heavily on powerful proprietary models such as GPT-4 for data generation, ScaleQuest efficiently harnesses smaller models (7B parameter scale), thus keeping costs manageable.

The method involves several key steps:

Question Generation from Scratch: This involves leveraging the context-free capability of causal LLMs to autonomously generate questions.
Question Fine-Tuning (QFT): The models are initially fine-tuned on a small subset of questions to activate their question generation potential without introducing overfitting.
Question Preference Optimization (QPO): A novel two-stage optimization involving Question Fine-Tuning and Question Preference Optimization is employed to enhance the solvability and difficulty of questions. External models like GPT-4o-mini are used during QPO to optimize the questions further by focusing on their clarity and the appropriateness of their difficulty levels.
Filtering and Response Generation: The generated questions undergo rigorous filtering processes to ensure solvability and linguistic clarity. The paper introduces a reward-model-based filtering process for selecting high-quality answers to the generated questions.

Results

Their approach yielded a dataset containing 1 million high-quality problem-solution pairs. When utilized to fine-tune mainstream open-source models such as Mistral and Llama3, significant performance improvements were observed. The improvements ranged between 29.2% and 46.4% over existing datasets on the MATH benchmark.

Notably, the paper reports that fine-tuning the Qwen2-Math-7B-Base model using ScaleQuest's dataset demonstrated performance surpassing well-known proprietary models, including GPT-4-Turbo and Claude-3.5 Sonnet, without prior preference optimization processes that these proprietary models commonly use.

Implications and Future Directions

The paper underscores the potential for effectively utilizing smaller models and a strategic synthesis approach to create robust reasoning datasets that enhance the baseline performances of LLMs. The ability to generate this high-quality data cost-effectively represents a significant advancement for the open-source community, who often lack access to the extensive resources available to proprietary model developers.

Considering future developments, the methodology proposed by ScaleQuest could be adapted for a broader range of reasoning tasks beyond mathematical problem-solving, such as scientific reasoning or competitive programming challenges. This extension would encompass tasks requiring diverse reasoning and solution paths, potentially broadening the applicability of advanced LLMs to multiple complex, domain-specific scenarios.

Moreover, iterative refinement of the data generation and filtration processes could further optimize the quality and diversity of the datasets, ultimately enhancing the self-improvement capabilities of LLMs. Future works could also explore the integration of broader and more diverse data sources, improving the adaptability and robustness of the models in handling complex and nuanced reasoning tasks.

In conclusion, the research presented in this paper delivers a scalable, cost-effective framework for reasoning data synthesis, which stands to significantly benefit open-source AI development and the wider AI community's efforts in advancing LLMs' capabilities.

PDF Markdown

Tweets

https://twitter.com/arXivGPT/status/1850607858087338287

https://twitter.com/arXivGPT/status/1850245586370347320

https://twitter.com/arXivGPT/status/1850970359492984950

https://twitter.com/javaeeeee1/status/1849756335295844682