EMR-Merging: Tuning-Free High-Performance Model Merging (2405.17461v1)

Published 23 May 2024 in cs.LG and cs.CV

Abstract: The success of pretrain-finetune paradigm brings about the release of numerous model weights. In this case, merging models finetuned on different tasks to enable a single model with multi-task capabilities is gaining increasing attention for its practicability. Existing model merging methods usually suffer from (1) significant performance degradation or (2) requiring tuning by additional data or training. In this paper, we rethink and analyze the existing model merging paradigm. We discover that using a single model's weights can hardly simulate all the models' performance. To tackle this issue, we propose Elect, Mask & Rescale-Merging (EMR-Merging). We first (a) elect a unified model from all the model weights and then (b) generate extremely lightweight task-specific modulators, including masks and rescalers, to align the direction and magnitude between the unified model and each specific model, respectively. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance. We find that EMR-Merging shows outstanding performance compared to existing merging methods under different classical and newly-established settings, including merging different numbers of vision models (up to 30), NLP models, PEFT models, and multi-modal models.

Authors (6)

Chenyu Huang (18 papers)
Peng Ye (142 papers)
Tao Chen (398 papers)
Tong He (124 papers)
Xiangyu Yue (93 papers)
Wanli Ouyang (359 papers)

Citations (12)

View on Semantic Scholar

Summary

The paper introduces EMR-Merging, a tuning-free method that merges pre-trained and fine-tuned models by electing key weights to retain essential task features.
It utilizes task-specific masks and rescalers to adjust directional alignment and magnitude, minimizing performance degradation.
Empirical tests across vision, NLP, and multi-modal tasks show that EMR-Merging achieves performance on par with individually fine-tuned models.

Overview of "EMR-Merging: Tuning-Free High-Performance Model Merging"

The paper "EMR-Merging: Tuning-Free High-Performance Model Merging" addresses the burgeoning challenge in the field of deep learning, specifically the efficient merging of pre-trained and fine-tuned models for multi-task capability without performance degradation. The emergence of a plethora of model weights due to the pretrain-finetune paradigm demands methods that enable models to operate across multiple tasks efficiently. The research proposes a novel method termed Elect, Mask, and Rescale (EMR-Merging), which aims to facilitate this objective sans additional data tuning or training demands.

Methodological Distinction

EMR-Merging diverges from traditional single-model merging approaches, which often result in significant performance loss or require data-dependent tuning. The core innovation of EMR-Merging lies in its electing process that aggregates model weights across different tasks to form a unified task vector, supplemented by task-specific modulation components—masks and rescalers—to fine-tune the model's task-specific adaptability. This double-layered approach undertakes the following key steps:

Electing a Unified Task Vector: The process starts by selecting the maximum absolute parameter value from each task's specific vector, aligning these by their directional sign. This aims to retain essential task-specific features while minimizing interference.
Task-Specific Modulators: After forming a unified vector, lightweight modulators are generated:
- Masks adjust for directional alignment with original task models.
- Rescalers ensure magnitude alignment, compensating for varied parameter scales.

These steps are designed to create a merging paradigm where performance aligns closely with models fine-tuned individually, a significant breakthrough in terms of merging efficiency.

Empirical Verification

The paper validates the efficacy of the proposed EMR-Merging method through extensive experimentation under various settings. Key highlights include:

When applied to vision models like ViT-Base and ViT-Large across multiple tasks, EMR-Merging significantly outperformed existing approaches—in some instances closely matching the performance of traditional multi-task learning.
The method's robustness was further evidenced when scaling to 30 task-specific models, where it maintained high accuracy while other methods faltered considerably.
In the domain of NLP, using RoBERTa and (IA) $^3$ models, EMR-Merging again demonstrated superior adaptability and task performance without the need for dataset-dependent adjustment.
Furthermore, the approach was applied successfully to multi-modal models like BEiT3, showcasing its versatility across diverse datasets including tasks like ImageNet classification and COCO captioning.

Theoretical Underpinning

The paper also provides a solid theoretical foundation for EMR-Merging. It incorporates analyses demonstrating how its innovative masking and rescaling techniques effectively minimize the discrepancies between merged models and their source individual models. The theoretical claims about distance reduction between weights using EMR-Merging formulations lend further credence to its utility.

Implications and Future Directions

EMR-Merging presents substantial implications for both practical application and theoretical advancement in model merging. Practically, it signifies a leap towards more efficient, scalable multi-task models without compromising on performance or requiring extensive parameter tuning. Theoretically, it prompts a re-evaluation of how task vectors are utilized in model merging, potentially offering new avenues for research into merging models with disparate architectures or training paradigms.

However, the method does introduce a minor trade-off in terms of additional memory storage requirements, a point that future research might aim to optimize further. Despite this, EMR-Merging's ability to merge linguistically and visually distinct tasks under a unified framework without pre-requisite data tuning portrays a promising stride for multifaceted AI applications. Future exploration could focus on expanding the method's applicability to training paradigms beyond the pretrain-finetune standard, and enhancing compatibility across models with vastly different configurations.

In conclusion, EMR-Merging stands as a significant contribution to the field of AI, providing a tuning-free, high-performance solution to model merging that aligns closely with both current and future computing demands.

PDF Markdown

Related Papers

Tweets

https://twitter.com/gastronomy/status/1795668459818221897

https://twitter.com/realmofresearch/status/1795831079510884462