Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AdaMerging: Adaptive Model Merging for Multi-Task Learning (2310.02575v2)

Published 4 Oct 2023 in cs.LG and cs.CV

Abstract: Multi-task learning (MTL) aims to empower a model to tackle multiple tasks simultaneously. A recent development known as task arithmetic has revealed that several models, each fine-tuned for distinct tasks, can be directly merged into a single model to execute MTL without necessitating a retraining process using the initial training data. Nevertheless, this direct addition of models often leads to a significant deterioration in the overall performance of the merged model. This decline occurs due to potential conflicts and intricate correlations among the multiple tasks. Consequently, the challenge emerges of how to merge pre-trained models more effectively without using their original training data. This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging). This approach aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Specifically, our AdaMerging method operates as an automatic, unsupervised task arithmetic scheme. It leverages entropy minimization on unlabeled test samples from the multi-task setup as a surrogate objective function to iteratively refine the merging coefficients of the multiple models. Our experimental findings across eight tasks demonstrate the efficacy of the AdaMerging scheme we put forth. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11\% improvement in performance. Notably, AdaMerging also exhibits superior generalization capabilities when applied to unseen downstream tasks. Furthermore, it displays a significantly enhanced robustness to data distribution shifts that may occur during the testing phase.

AdaMerging: Enhancing Model Merging for Multi-Task Learning

The paper introduces a novel approach to model merging within the paradigm of multi-task learning (MTL) titled "AdaMerging: Adaptive Model Merging for Multi-Task Learning." The primary objective is to improve the performance of model merging in MTL setups without relying on the original training data. This is particularly relevant given the current landscape, where models fine-tuned for distinct tasks from the same pre-trained model are often preferred over expansive datasets due to computational and privacy constraints.

Key Contributions and Methodology

  1. Task Arithmetic and Model Merging Challenges: Traditional methods like task arithmetic involve a straightforward summation of task vectors to enable a pre-trained model to handle multiple tasks. This is achieved by calculating a vector from the difference between fine-tuned and pre-trained model weights. However, these methods suffer performance degradation due to the sensitivity to the merging coefficient, suggesting task conflicts and correlations need addressing.
  2. AdaMerging Technique: The AdaMerging approach emerges as a solution to the limitations of existing task vector-based techniques. It introduces a mechanism for autonomously learning the merging coefficients in either a task-wise or layer-wise manner, independent of original training data. This unsupervised learning process leverages entropy minimization on unlabeled test samples, a concept borrowed from test-time adaptation strategies.
  3. Layer-wise Adaptability: The paper explores layer-wise adaptability, allowing different merging coefficients for each layer, accommodating discrepancies across layers in terms of general features and task-specific features.
  4. Empirical Evaluation Against State-of-the-Art Methods: The authors provide extensive empirical evidence through their experiments across eight diverse tasks, utilizing models like ViT-B/32, ViT-B/16, and ViT-L/14 as pre-trained architectures. AdaMerging consistently outperforms existing task vector-based methods, showcasing significant improvements in average accuracy and robustness to data distribution shifts.

Empirical Results

  • Performance Improvement: AdaMerging demonstrates improvements of up to 11% over traditional task arithmetic approaches. This conveys a stronger efficacy in handling task interference and optimizing multi-task performance.
  • Generalization Capabilities: On unseen tasks, AdaMerging shows better adaptability and maintenance of performance compared to existing methods, evidencing its capability to generalize across tasks without prior knowledge.
  • Robustness: The paper meticulously tests AdaMerging under scenarios of data corruption to evaluate robustness against distribution shifts. The method maintains superior performance compared to task arithmetic and ties-merging techniques, which further highlights its robustness and reliability in real-world applications.

Implications and Future Directions

The introduction of AdaMerging for model merging in MTL opens numerous research directions. Practically, it alleviates dependency on large datasets and computationally intensive joint training processes, providing a flexible framework suitable for diverse applications like computer vision, NLP, and beyond. Theoretically, it contributes to understanding adaptive coefficient learning in neural network models, potentially influencing how models are structured and merged without data retraining.

Future investigations may delve into refining entropy minimization processes and exploring additional proxy objectives. Additionally, the application of AdaMerging in architectures beyond those explored could widen its utility. This research lays a foundation for further advancements in adaptive learning algorithms tailored to model merging and MTL dynamics.

The paper presents a significant stride toward enhancing model merging methodologies, demonstrating both practical and theoretical advancements in multi-task learning frameworks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Enneng Yang (24 papers)
  2. Zhenyi Wang (27 papers)
  3. Li Shen (363 papers)
  4. Shiwei Liu (76 papers)
  5. Guibing Guo (35 papers)
  6. Xingwei Wang (35 papers)
  7. Dacheng Tao (829 papers)
Citations (49)
Youtube Logo Streamline Icon: https://streamlinehq.com