MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic (2406.11385v2)

Published 17 Jun 2024 in cs.CL

Abstract: The advent of LLMs like GPT-4 has catalyzed the exploration of multi-task learning (MTL), in which a single model demonstrates proficiency across diverse tasks. Task arithmetic has emerged as a cost-effective approach for MTL. It enables performance enhancement across multiple tasks by adding their corresponding task vectors to a pre-trained model. However, the current lack of a method that can simultaneously achieve optimal performance, computational efficiency, and data privacy limits their application to LLMs. In this paper, we propose \textbf{M}odel \textbf{E}xclusive \textbf{T}ask \textbf{A}rithmetic for merging \textbf{GPT}-scale models, which formalizes the objective of model merging into a multi-task learning framework, aiming to minimize the average loss difference between the merged model and each individual task model. Since data privacy limits the use of multi-task training data, we leverage LLMs' local linearity and task vectors' orthogonality to separate the data term and scaling coefficients term and derive a model-exclusive task arithmetic method. Our proposed MetaGPT is data-agnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs.Extensive experiments demonstrate that MetaGPT leads to improvements in task arithmetic and achieves state-of-the-art performance on multiple tasks.

Authors (4)

Yuyan Zhou (6 papers)
Liang Song (60 papers)
Bingning Wang (29 papers)
Weipeng Chen (56 papers)

Citations (7)

View on Semantic Scholar

Summary

MetaGPT: Merging LLMs Using Model Exclusive Task Arithmetic

The paper "MetaGPT: Merging LLMs Using Model Exclusive Task Arithmetic" introduces a novel approach for merging LLMs to efficiently enhance multi-task learning (MTL) capabilities. This method, termed MetaGPT, focuses on combining multiple pre-trained models to optimize task performance, reduce computational overhead, and maintain data privacy.

Key Contributions

Formalization and Theoretical Bounds: The authors provide a mathematical formulation for the optimization objective of task arithmetic. They deliver a theoretical analysis of the performance bounds associated with such task merging procedures. The goal is to minimize the average loss difference between the merged model and each individual task model.
Task Arithmetic Approach: MetaGPT leverages a property of LLMs—local linearity, coupled with the orthogonality of task vectors. This separation allows for a decoupling of the data term from the scaling coefficients, which they address through model-exclusive task arithmetic. This results in a closed-form solution for scaling coefficients, ignoring the need for additional task-specific data.
Data Agnosticism and Computational Efficiency: Unlike traditional multi-task learning methods that require extensive computational resources and access to multi-task training data, MetaGPT operates independently of any additional data and avoids cumbersome hyperparameter searches. Its implementation is cost-effective and straightforward, scaling efficiently to models of GPT-3 caliber and beyond.
Integration with Existing Methods: The paper claims that MetaGPT is orthogonal to existing task vector-improving methods (e.g., Ties-Merging and DARE). This implies that MetaGPT can be used in conjunction with these methods to potentially achieve even higher performance levels.

Numerical Results and Evaluations

The paper provides extensive experimental validation of the MetaGPT method on state-of-the-art LLMs like LLaMA-2 and Mistral in diverse tasks, demonstrating its efficacy in improving task arithmetic and establishing state-of-the-art performance. MetaGPT consistently outperforms existing methods in average performance across multiple datasets.

Practical and Theoretical Implications

Practical Implications: MetaGPT's ability to merge models optimally without relying on additional training data or complex searches for hyperparameters has significant implications for the deployment of LLMs. It can reduce infrastructure costs associated with managing multiple task-specific models, enhancing the scalability and accessibility of powerful language technologies.

Theoretical Implications: The formal analysis provided by the authors highlights the utility of exploiting intrinsic model properties like linearity and orthogonality in task vectors. This work contributes to the broader understanding of efficient model combination strategies within the MTL paradigm.

Speculation on Future Developments

Future research could extend the MetaGPT framework to explore the impact of applying task arithmetic in domains beyond natural language processing, such as in vision or multi-modal model merging. Additionally, integrating MetaGPT with adaptive learning rate strategies could further optimize its convergence and application to novel tasks in unseen domains.

Overall, the paper presents a compelling method for efficient multi-task model training, broadening the applicative scope of LLMs in both academia and industry. It marks a significant step toward democratizing access to sophisticated AI systems while maintaining computational practicality.

PDF Markdown

Related Papers

Tweets

https://twitter.com/realmofresearch/status/1804362838192066741

https://twitter.com/LiangSong850509/status/1844013735691829332

YouTube

Show All Videos