Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs (2406.08657v1)

Published 12 Jun 2024 in cs.CL

Abstract: Despite the advances in LLMs, exemplified by models like GPT-4 and Claude, smaller-scale LLMs such as Llama and Mistral often struggle with generating in-depth and coherent dialogues. This paper presents a novel two-step Coarse-to-Fine Actor model to address the inherent limitations in conversational and analytical capabilities of small-sized LLMs. Our approach begins with the Policy-based Coarse Actor, employing a technique we term "Continuous Maximization". The Coarse Actor establishes an enhanced, knowledge-rich pool adept at aligning with human preference styles in analysis and reasoning. Through the RLHF process, it employs Continuous Maximization, a strategy that dynamically and adaptively extends the output length limit, enabling the generation of more detailed and analytical content. Subsequently, the Fine Actor refines this analytical content, addressing the generation of excessively redundant information from the Coarse Actor. We introduce a "Knowledge Residue Merger" approach, refining the content from the Coarse Actor and merging it with an existing Instruction model to improve quality, correctness, and reduce redundancies. We applied our methodology to the popular Mistral model, creating Mistral-C2F, which has demonstrated exceptional performance across 11 general language tasks and the MT-Bench Dialogue task, outperforming similar-scale models and even larger models with 13B and 30B parameters. Our model has significantly improved conversational and analytical reasoning abilities.

Summary

The paper introduces a dual-phase Coarse-to-Fine Actor method that dynamically extends model output using Continuous Maximization to enhance analytical reasoning.
It employs a Knowledge Residue Merger to refine coarse outputs, reducing redundancy and effectively integrating enhanced instructions.
Experimental results demonstrate superior performance over similar-scale and larger LLMs across 11 language tasks and dialogue evaluations.

Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs

The paper "Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs" presents an innovative methodology designed to address the limitations of smaller-scale LLMs such as Mistral. Despite their notable performance in general benchmarks, these smaller LLMs often struggle with producing in-depth and coherent dialogues. The proposed Mistral-C2F model employs a two-step Coarse-to-Fine Actor approach, leveraging Reinforcement Learning from Human Feedback (RLHF) to enhance their conversational and analytical capabilities.

Methodology

Coarse Actor: Continuous Maximization Strategy

The first step, termed the Policy-based Coarse Actor, introduces a concept called Continuous Maximization (CM). This strategy dynamically and adaptively extends the output length limit based on the model's performance metrics, such as reward model scores and critic value function losses. Unlike traditional methods with fixed iteration steps, the CM strategy allows for an optimal extension of the model's analytical and reasoning capabilities by preventing automatic stopping and enabling the generation of more detailed content.

Fine Actor: Knowledge Residue Merger

The second step, the Fine Actor, refines the output from the Coarse Actor. This stage addresses the generation of excessive redundant information inherent in the Coarse phase. The Fine Actor uses a Knowledge Residue Merger (KRM) approach to merge refined content from the Coarse Actor with an existing Instruction model. This merger ensures that the enhanced analytical reasoning is integrated with the Instruction model, improving quality and correctness while reducing redundancies.

Experimental Setup and Results

The experimental setup involved rigorous evaluation across 11 general language tasks and the MT-Bench Dialogue task, which allowed for comprehensive testing of the model's capabilities. The authors used the PyTorch framework, AdamW optimizer, and DeepSpeed framework with bfloat16 precision for training. The Mistral-C2F model demonstrated exceptional performance, often outperforming both similar-scale models and larger models with 13B and 30B parameters.

General Task Evaluation

The evaluation on general tasks highlighted significant improvements:

MMLU: Mistral-C2F scored 58.13%, surpassing the previous state-of-the-art models.
AGIEval and BBH: Demonstrated superior performance with scores of 39.50% and 57.38% respectively.
ARC-E, ARC-C, HellaSWAG, Winogrande, RACE-M, RACE-H, GSM8K: Consistently outperformed other models, showcasing the robust analytical and reasoning enhancements introduced by the Coarse-to-Fine methodology.

MT-Bench Dialogue

In dialogue tasks, Mistral-C2F also excelled, scoring 7.29 in the MT-Bench evaluation. This score is significantly higher than those of comparable models, including the enhanced Mistral-Plus and Mistral-Instruct models. This underscores the effectiveness of the dual-phase approach in refining conversational abilities.

Implications and Future Directions

Practically, the Mistral-C2F model offers a robust solution for applications requiring detailed analytical reasoning and coherent dialogue generation. Theoretically, it advances the field of LLM alignment by introducing a novel Coarse-to-Fine strategy within RLHF frameworks. This approach could inspire further research into dynamic output length extensions and refined content merging techniques.

Future developments in this area could focus on fine-tuning the merger ratios in the Fine Actor phase, exploring different adaptive strategies for the Coarse Actor, and expanding the application of this methodology to other LLMs. Additionally, further empirical studies could assess the impact of this approach on more diverse and complex tasks, potentially improving the broader applicability of smaller-scale LLMs in various domains.

Conclusion

In summary, the Mistral-C2F model offers a novel and effective solution for enhancing the conversational and analytical capabilities of smaller-scale LLMs. The Coarse-to-Fine Actor approach, characterized by Continuous Maximization and Knowledge Residue Merger strategies, has demonstrated significant improvements in both general language tasks and dialogue generation. This paper makes a substantial contribution to the field of natural language processing, particularly in the alignment and enhancement of LLMs using RLHF.

Related Papers

Tweets

https://twitter.com/_akhaliq/status/1801434103541006457

https://twitter.com/fly51fly/status/1802277846255325329