Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data (2309.11235v2)

Published 20 Sep 2023 in cs.CL
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

Abstract: Nowadays, open-source LLMs like LLaMA have emerged. Recent developments have incorporated supervised fine-tuning (SFT) and reinforcement learning fine-tuning (RLFT) to align these models with human goals. However, SFT methods treat all training data with mixed quality equally, while RLFT methods require high-quality pairwise or ranking-based preference data. In this study, we present a novel framework, named OpenChat, to advance open-source LLMs with mixed-quality data. Specifically, we consider the general SFT training data, consisting of a small amount of expert data mixed with a large proportion of sub-optimal data, without any preference labels. We propose the C(onditioned)-RLFT, which regards different data sources as coarse-grained reward labels and learns a class-conditioned policy to leverage complementary data quality information. Interestingly, the optimal policy in C-RLFT can be easily solved through single-stage, RL-free supervised learning, which is lightweight and avoids costly human preference labeling. Through extensive experiments on three standard benchmarks, our openchat-13b fine-tuned with C-RLFT achieves the highest average performance among all 13b open-source LLMs. Moreover, we use AGIEval to validate the model generalization performance, in which only openchat-13b surpasses the base model. Finally, we conduct a series of analyses to shed light on the effectiveness and robustness of OpenChat. Our code, data, and models are publicly available at https://github.com/imoneoi/openchat and https://huggingface.co/openchat.

Advancing Open-source LLMs with Mixed-Quality Data: A Review of OpenChat

The paper "OpenChat: Advancing Open-source LLMs with Mixed-Quality Data" discusses an innovative approach to enhance the performance of open-source LLMs, specifically targeting scenarios where training data is of mixed quality. The authors introduce OpenChat, emphasizing a new framework called Conditioned-Reinforcement Learning Fine-tuning (C-RLFT) to effectively utilize mixed-quality datasets without the necessity of fine-grained preference labels.

Key Contributions and Methods

Problem Scope

The primary issue addressed by the paper is the prevalent challenge in supervised fine-tuning (SFT) methods, which indiscriminately treat all training data equally. This is problematic because datasets often contain both high-quality and sub-optimal data. On the other hand, reinforcement learning fine-tuning (RLFT) methods typically require high-quality, pairwise, or ranking-based preference data, which is expensive to gather. The authors seek to bridge this gap by proposing a novel approach that can leverage mixed-quality data effectively.

Conditioned-RLFT Framework

The proposed C-RLFT framework innovatively resolves limitations by introducing a class-conditioned policy that distinguishes data sources based on coarse-grained reward labels. Here's a detailed overview of this method:

  1. Class-Conditioned Dataset and Rewards:
    • The authors classify the data into expert data and sub-optimal data, encoding rewards of 1 for expert and a lower value (α < 1) for sub-optimal data.
  2. Policy Optimization:
    • The conditioned policy incorporates data source information as an additional dimension, optimizing the model based on a KL-regularized RL framework. This novel approach substitutes traditional base model regularization with a class-conditioned reference policy, significantly enhancing the quality differentiation.
  3. Model Inference:
    • During inference, the OpenChat model utilizes specific prompts used in high-quality data training, ensuring the generation of high-quality responses aligned with expert data patterns.

Experimental Validation and Implications

The authors validated OpenChat on several benchmarks, including AlpacaEval, MT-bench, and Vicuna-bench for instruction-following abilities, and AGIEval to assess model generalization. The OpenChat-13b model consistently demonstrated superior performance:

  • AlpacaEval and MT-bench:
    • OpenChat-13b achieved the highest win rate among all 13b open-source models, outperforming even gpt-3.5-turbo in several instances.
  • AGIEval:
    • The model surpassed base llama-2-13b in generalization tasks, indicating robustness against overfitting and maintaining accuracy across diverse tasks.

These results are significant as they illustrate that the OpenChat framework can effectively utilize mixed-quality datasets, offering practical benchmarks for the deployment of LLMs in varied applications where data quality is not uniformly high.

Future Directions

The paper opens several avenues for future research:

  1. Fine-grained Reward Tuning:
    • While the coarse-grained reward system used in OpenChat is efficient, exploring more nuanced reward structures could further improve model performance.
  2. Extended Applications:
    • Extending the C-RLFT framework to enhance reasoning abilities alongside instruction-following capabilities could broaden the practical applications of LLMs in complex task scenarios.
  3. Data Source Quality Metrics:
    • Developing metrics to better quantify and utilize the quality of data sources could improve the robustness of models trained on mixed-quality datasets.

Conclusion

The authors of "OpenChat: Advancing Open-source LLMs with Mixed-Quality Data" present a compelling framework to address inherent challenges in training LLMs with heterogeneous data quality. Their approach leverages class-conditioned policies and a novel reward optimization strategy to achieve superior performance and robustness. This work represents a significant contribution to the field of AI, offering a practical, effective solution for enhancing the capabilities of open-source LLMs. As AI continues to evolve, the principles and methodologies introduced in this paper will likely inspire further innovations and applications.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Guan Wang (51 papers)
  2. Sijie Cheng (23 papers)
  3. Xianyuan Zhan (47 papers)
  4. Xiangang Li (46 papers)
  5. Sen Song (24 papers)
  6. Yang Liu (2253 papers)
Citations (196)
Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com