- The paper introduces the GEM method, applying maximum entropy regularization with reverse KL divergence to reduce overfitting during supervised fine-tuning.
- Experiments show that GEM lowers perplexity, improves instruction-following, and boosts diversity in tasks like creative writing, math reasoning, and code generation.
- The findings imply that GEM can enhance model robustness and versatility, with potential applications in RLHF pipelines and synthetic data generation.
Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity
Introduction
LLMs [openai2023gpt4, touvron2023llama, team2024gemma] are prominent tools in various applications, achieving notable success through pre-training, where they develop a robust ability to predict the next token given a preceding text sequence. Despite their extensive pre-training, these models often underperform in specific tasks, requiring additional fine-tuning to enhance their ability to follow instructions and provide satisfactory responses. Supervised Fine-Tuning (SFT) is commonly employed to refine these models, typically using Cross Entropy (CE) loss to maximize the likelihood of labeled data. However, this method frequently leads to overfitting and reduced output diversity, limiting the models' practical applicability in generating diverse and creative outputs.
Contributions and Methodology
The paper introduces a novel distribution matching method named Generative Entropy-regularized Matching (GEM) to address the limitations of CE loss in SFT. GEM applies the maximum entropy principle to promote models that generate flatter, more generalized distributions, thereby mitigating overfitting and fostering better output diversity. The GEM approach is formulated as an optimization problem that minimizes reverse Kullback-Leibler (KL) divergence with an entropy regularization term.
Key Elements of GEM:
- Generative Approach to Distribution Matching: Unlike the CE loss, which focuses solely on imitating supervised data, GEM encourages models to learn from both correct responses and their own generated mistakes.
- Entropy Regularization: This aspect aims to prevent over-memorizing specific data samples, reducing overfitting and enhancing the diversity of generated outputs.
Experiments and Results
The GEM method was evaluated on several metrics to substantiate its efficacy over traditional CE loss, showcasing improvements in both generalist and specialized applications.
General-Instruction Following
Using the UltraFeedback dataset to fine-tune Llama-3-8B models, GEM demonstrated superior performance compared to CE in various aspects:
- Reduced Perplexity: The GEM-trained models exhibited lower evaluation perplexity, suggesting less overfitting.
- Enhanced Instruction-Following Performance: When tested on the IFEval benchmark, GEM outperformed CE, showing better adherence to provided instructions.
Output Diversity and Creativity
In tasks requiring creative outputs, such as poem and story writing, GEM-trained models achieved significantly higher diversity. This was measured using:
- N-Gram Diversity
- Self-BLEU Diversity
- Sentence-BERT Diversity
These metrics indicate a broader and more varied generation capability, enhancing the models' usefulness in applications where creativity and flexibility are paramount.
Specialized Tasks: Math Reasoning and Code Generation
When fine-tuned for domain-specific tasks, GEM maintained its advantages:
- Math Reasoning: Using datasets like GSM8K and MATH, GEM showed improved performance measured by Majority Voting (MV) and Best-Of-N (BON) sampling methods.
- Code Generation: In benchmarks like HumanEval and MBPP, GEM achieved higher pass rates across samples, demonstrating its effectiveness in generating correct and varied programming solutions.
Implications and Future Directions
The introduction of GEM suggests substantial improvements in the fine-tuning of LLMs, with direct implications for both theory and practice. The advancements in reducing overfitting and enhancing diversity can lead to more robust and versatile models, applicable in diverse fields from creative writing to technical problem-solving.
Future work may explore the integration of GEM-trained models into Reinforcement Learning from Human Feedback (RLHF) pipelines, potentially reducing the preference collapse issue and improving alignment with human values. Additionally, GEM's enhanced diversity may prove beneficial in self-distillation practices and synthetic data generation, paving the way for more sophisticated and autonomous AI systems.