Improving Model Alignment Through Collective Intelligence of Open-Source LLMS (2505.03059v1)

Published 5 May 2025 in cs.CL

Abstract: Building helpful and harmless LLMs requires effective model alignment approach based on human instructions and feedback, which necessitates high-quality human-labeled data. Constructing such datasets is often expensive and hard to scale, and may face potential limitations on diversity and generalization. To address these challenges, we introduce Mixture of Agents Alignment (MoAA), that leverages the collective strengths of various LLMs to provide high-quality data for model alignment. By employing MoAA, we enhance both supervised fine-tuning and preference optimization, leading to improved performance compared to using a single model alone to generate alignment data (e.g. using GPT-4o alone). Evaluation results show that our approach can improve win rate of LLaMA-3.1-8B-Instruct from 19.5 to 48.3 on Arena-Hard and from 22.33 to 57.23 on AlpacaEval2, highlighting a promising direction for model alignment through this new scalable and diverse synthetic data recipe. Furthermore, we demonstrate that MoAA enables a self-improvement pipeline, where models finetuned on MoA-generated data surpass their own initial capabilities, providing evidence that our approach can push the frontier of open-source LLMs without reliance on stronger external supervision. Data and code will be released.

PDF Abstract

Enhancing Model Alignment via Collective Intelligence of Open-Source LLMs

The paper "Improving Model Alignment Through Collective Intelligence of Open-Source LLMs" explores an innovative approach to model alignment, a crucial component for refining the performance of LLMs to ensure their outputs are both helpful and harmless. Traditionally, model alignment relies heavily on supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF), each requiring substantial amounts of high-quality human-labeled data. This dependency poses challenges in terms of cost, scalability, diversity, and generalization capabilities.

To address these limitations, the authors introduce the Mixture of Agents Alignment (MoAA), a methodology leveraging the collective intelligence of multiple open-source LLMs to generate synthetic data that enhances model alignment processes. This approach consists of two main stages:

MoAA-SFT Stage: In this stage, various open-source models collaboratively produce high-quality synthetic data used for supervised fine-tuning. Unlike traditional methods that depend on data from a single model, MoAA integrates outputs from multiple models, thereby enriching data diversity and improving the fine-tuned model's responsiveness to instructions.
MoAA-DPO Stage: The second stage involves Direct Preference Optimization (DPO) using the MoAA approach as a reward model. Responses generated by the fine-tuned model are evaluated by a combination of LLMs to determine preferences, further aligning the model to human-like judgments without additional reward model training.

The paper presents empirical evidence showing that MoAA significantly boosts model performance across various benchmarks such as Arena-Hard and AlpacaEval2. Specifically, models fine-tuned using the MoAA synthetic data exhibited substantial improvement in win rates: LLaMA-3.1-8B-Instruct's win rate improved dramatically from 19.5 to 48.3 on Arena-Hard, and from 22.33 to 57.23 on AlpacaEval2. This performance leap demonstrates the efficacy of MoAA in overcoming data limitations commonly associated with model alignment processes.

Furthermore, the paper illustrates that MoAA can create a self-improvement pipeline, where models fine-tuned iteratively with MoAA-generated data surpass their own initial capabilities, effectively advancing the frontier of open-source LLMs without relying on stronger external supervision.

The implications of this research are multifaceted. Practically, MoAA offers a more scalable and cost-efficient strategy for model alignment, reducing reliance on expensive human-labeled datasets and proprietary black-box models. Theoretically, this method opens new avenues for exploring collaborative intelligence among LLMs, potentially leading to improvements in machine learning algorithms and architectures aimed at optimizing model alignment. In the future, MoAA could inspire a shift towards ensemble approaches in AI development, fostering advancements in autonomous reasoning and decision-making systems via collective model insights.

Overall, the authors provide a compelling approach to enhancing the alignment of LLMs through collective intelligence, expanding both the scope and efficiency of AI systems.