ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection (2505.16475v1)

Published 22 May 2025 in cs.AI

Abstract: We present a novel pipeline, ReflectEvo, to demonstrate that small LLMs (SLMs) can enhance meta introspection through reflection learning. This process iteratively generates self-reflection for self-training, fostering a continuous and self-evolving process. Leveraging this pipeline, we construct ReflectEvo-460k, a large-scale, comprehensive, self-generated reflection dataset with broadened instructions and diverse multi-domain tasks. Building upon this dataset, we demonstrate the effectiveness of reflection learning to improve SLMs' reasoning abilities using SFT and DPO with remarkable performance, substantially boosting Llama-3 from 52.4% to 71.2% and Mistral from 44.4% to 71.1%. It validates that ReflectEvo can rival or even surpass the reasoning capability of the three prominent open-sourced models on BIG-bench without distillation from superior models or fine-grained human annotation. We further conduct a deeper analysis of the high quality of self-generated reflections and their impact on error localization and correction. Our work highlights the potential of continuously enhancing the reasoning performance of SLMs through iterative reflection learning in the long run.

PDF Abstract

ReflectEvo: Enhancing Meta Introspection in Small LLMs through Self-Reflection Learning

The paper "ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection" by Jiaqi Li et al. presents ReflectEvo, a pipeline designed to improve the reasoning abilities of small LLMs (SLMs) by leveraging a novel self-training method called reflection learning. This research primarily addresses the challenge of enhancing the introspective and reasoning capabilities of SLMs without relying on large-scale human annotations or performance distillation from more capable models.

Key Contributions and Methodology

ReflectEvo introduces a systematic approach for generating and utilizing self-reflection data, enabling SLMs to iteratively improve their reasoning skills. The core contribution is the creation of a large-scale dataset, ReflectEvo-460k, which consists of 460,000 self-generated reflection samples derived from 17 diverse source datasets covering ten tasks and domains. This dataset provides a comprehensive resource for training models to enhance their self-reflection and correction abilities.

The methodological framework implemented in ReflectEvo can be outlined as follows:

Data Generation: The ReflectEvo pipeline generates self-reflection data by incorporating a two-part system comprising a Generator and a Reflector. The Generator initially produces answers with reasoning paths, while the Reflector refines incorrect outputs through introspection and correction, using various instructional prompts to enrich the reflection process.
Self-Training through Reflection Learning: Reflective performance enhancement is achieved using supervised fine-tuning (SFT) and Direct Preference Optimization (DPO). Four specific training paradigms are explored:
- One-stage and two-stage training focusing on aligning model outputs with high-quality self-generated reflections.
- Utilizing both correct (positive) and incorrect (negative) reflections to maximize learning via a preference-based framework.
Evaluation and Analysis: The effectiveness of reflection learning is empirically validated, showing substantial performance gains across various models and tasks. Significant improvements were observed in models like Llama-3 and Mistral, with reflection-enhanced models outperforming their larger counterparts on certain benchmarks.

Numerical Results and Observations

For the Llama-3 model, reflection learning improved performance from 52.4% to 71.2% on the BIG-bench benchmark, signifying a marked enhancement in reasoning capabilities.
Mistral model's accuracy on logical reasoning datasets increased by over 20% following reflection training.
The experiments demonstrated that reflection data from self-generated models could effectively generalize across different tasks, underscoring the adaptability of ReflectEvo.

Implications and Future Directions

The research highlights the potential of reflection learning to serve as a plug-and-play enhancement for various reasoning methodologies, mimicking human-like introspection. By reducing dependency on extensive labelled data and human intervention, ReflectEvo offers a scalable solution to improve LLMs' reasoning prowess.

The implications of this work are multifaceted:

Theoretical Advances: It provides insights into the self-supervised learning paradigm, advocating for the internal reflection mechanism as a means to promote autonomous model improvement.
Practical Applications: ReflectEvo could be instrumental in deploying cost-effective AI systems in environments constrained by resources, by using smaller, introspectively enhanced models.

Future research might focus on further optimizing reflection-induced learning across different model architectures and extending this paradigm to more complex tasks requiring nuanced reasoning and contextual understanding. Investigating the iterative and independent evolution of self-reflection bases may also provide deeper insights into how models can develop richer cognitive abilities autonomously.

PDF Markdown Bookmark Chat (Pro)

Authors (9)

Jiaqi Li (142 papers)
Xinyi Dong (3 papers)
Yang Liu (2253 papers)
Zhizhuo Yang (5 papers)
Quansen Wang (5 papers)
Xiaobo Wang (32 papers)
SongChun Zhu (1 paper)
Zixia Jia (15 papers)
Zilong Zheng (63 papers)

Related Papers

Find Related Papers

YouTube

Show All Videos