RATIONALYST: Pre-training Process-Supervision for Improving Reasoning (2410.01044v1)

Published 1 Oct 2024 in cs.AI and cs.CL

Abstract: The reasoning steps generated by LLMs might be incomplete, as they mimic logical leaps common in everyday communication found in their pre-training data: underlying rationales are frequently left implicit (unstated). To address this challenge, we introduce RATIONALYST, a model for process-supervision of reasoning based on pre-training on a vast collection of rationale annotations extracted from unlabeled data. We extract 79k rationales from web-scale unlabelled dataset (the Pile) and a combination of reasoning datasets with minimal human intervention. This web-scale pre-training for reasoning allows RATIONALYST to consistently generalize across diverse reasoning tasks, including mathematical, commonsense, scientific, and logical reasoning. Fine-tuned from LLaMa-3-8B, RATIONALYST improves the accuracy of reasoning by an average of 3.9% on 7 representative reasoning benchmarks. It also demonstrates superior performance compared to significantly larger verifiers like GPT-4 and similarly sized models fine-tuned on matching training sets.

PDF HTML Abstract

Rationalyst: Enhancing LLM Reasoning via Pre-trained Process Supervision

The paper introduces Rationalyst, a model designed to enhance the reasoning capabilities of LLMs by pre-training them to extract and utilize implicit rationales from web-scale datasets. The authors aim to address incomplete reasoning steps in current LLMs, leveraging implicit logic often found in unannotated text data.

Methodology

The authors propose a systematic approach for extracting rationales from vast unlabelled datasets, incorporating minimal human intervention. This process is achieved through three main stages:

Rationale Extraction: Using the Pile, a large-scale unlabelled dataset, and reasoning datasets like GSM8K and ECQA, Rationalyst extracts approximately 79,000 rationales. These rationales capture underlying logical connections often implicit in text.
Rationale Filtration: A filtering mechanism ensures only rationales that enhance future prediction accuracy are retained. This procedural step ensures robustness and focuses the training on quality data.
Inference Integration: During inference, Rationalyst provides process supervision by generating rationales to support chain-of-thought reasoning. This technique enhances the model’s ability to navigate complex reasoning tasks by filling in omitted logical steps.

Experimental Results

The results demonstrate that Rationalyst, fine-tuned from LLaMa-3-8B, significantly improves reasoning accuracy across a diverse set of benchmarks, achieving an average increase of 3.9% in accuracy. Interestingly, Rationalyst outperforms even larger models like GPT-4, reflecting the efficacy of targeted rationale supervision over simply scaling model parameters.

Implications and Future Directions

The integration of Rationalyst offers several key implications:

Enhanced Interpretability: By generating human-readable rationales, Rationalyst not only improves performance but also provides insights into the model’s reasoning process, particularly beneficial in domains like mathematics and science.
Scalability: The use of web-scale datasets without intensive human annotation suggests a scalable approach, potentially applicable across varied domains and tasks.
Model Adaptability: Future adaptations could include using stronger models for rationale extraction and experimenting with larger, more diverse datasets to further refine rationale quality and applicability.

Conclusion

Rationalyst represents a notable advancement in process-supervision for LLMs, emphasizing the importance of implicit rationale extraction and use. This methodology not only improves reasoning performance but also ensures models are both interpretable and scalable, setting a foundation for further research in AI reasoning enhancement.