Emergent Mind

TIES-Merging: Resolving Interference When Merging Models

(2306.01708)
Published Jun 2, 2023 in cs.LG , cs.AI , cs.CL , and cs.CV

Abstract

Transfer learning - i.e., further fine-tuning a pre-trained model on a downstream task - can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency. These advantages have led to a proliferation of task-specific fine-tuned models, which typically can only perform a single task and do not benefit from one another. Recently, model merging techniques have emerged as a solution to combine multiple task-specific models into a single multitask model without performing additional training. However, existing merging methods often ignore the interference between parameters of different models, resulting in large performance drops when merging multiple models. In this paper, we demonstrate that prior merging techniques inadvertently lose valuable information due to two major sources of interference: (a) interference due to redundant parameter values and (b) disagreement on the sign of a given parameter's values across models. To address this, we propose our method, TRIM, ELECT SIGN & MERGE (TIES-Merging), which introduces three novel steps when merging models: (1) resetting parameters that only changed a small amount during fine-tuning, (2) resolving sign conflicts, and (3) merging only the parameters that are in alignment with the final agreed-upon sign. We find that TIES-Merging outperforms several existing methods in diverse settings covering a range of modalities, domains, number of tasks, model sizes, architectures, and fine-tuning settings. We further analyze the impact of different types of interference on model parameters, and highlight the importance of resolving sign interference. Our code is available at https://github.com/prateeky2806/ties-merging
Interference patterns during the same task in a computational model.

Overview

  • TIES-MERGING is introduced as a method to effectively merge models by addressing interference issues in parameter values.

  • This new approach involves trimming task vectors, electing parameter signs, and disjoint merging to improve multitask model performance.

  • The method proves superior to previous techniques in tasks across various domains, especially in NLP and computer vision.

  • Analyses show that resolving sign interference is crucial for merging models, with potential performance matching simultaneous multitask training.

  • TIES-MERGING is efficient, works with fixed hyperparameters, and is a step forward in practical multitask model management.

Introduction to Model Merging and its Challenges

Transfer learning has been instrumental in enhancing AI model performance, especially when harnessing the power of pre-trained models (PTMs) in specific applications. By fine-tuning PTMs on downstream tasks, data scientists have obtained tailored models that offer improved accuracy, convergence speed, and sample efficiency. A flux of fine-tuned models, however, presents challenges like model storage, deployment, and managing a sprawling number of variations. Multitask learning proposes an elegant solution, but it is computationally expensive and requires all task data simultaneously which may not always be feasible.

Model merging has emerged as a practical alternative, creating a multitask model by combining multiple task-specific models without further training. While this saves computational resources, current merging methods often overlook how the parameters of different models might interfere with each other when combined. This oversight can lead to degraded performance due to two primary types of interference: one resulting from redundant parameter values and the other from sign disagreements across models.

Addressing Interference with TIES-MERGING

To counter these issues, a novel method known as TIES-MERGING (TRIM, ELECT SIGN & MERGE) introduces a three-step approach to model merging. The approach begins by trimming task vectors which represent parameter updates during fine-tuning, thus removing redundant parameter changes that do not significantly impact performance. Next, the method elects signs for each parameter by resolving disagreements between models—instead of averaging conflicting updates, it chooses the direction of the parameter change that reflects the total magnitude of the updates across models. The final step is to only merge parameter updates which have the elected sign, hence dubbed "disjoint merging."

Empirical Validation across Diverse Scenarios

The effectiveness of TIES-MERGING is established across numerous scenarios, encompassing different model sizes, architectures, fine-tuning approaches, and domains such as NLP and computer vision. The method outperforms several existing merging techniques, not only in specific domain tasks but also in out-of-domain generalization tests.

Insights into Model Parameters and Signs

Analyses further reveal the impact of interference on model parameters and underscore the significance of correctly resolving sign interference. Experiments show that accurate resolution of parameter signs can nearly reach the performance of a model trained on combined tasks simultaneously, thus potentially closing the gap between merging and multitask training.

Conclusion and Future Directions

TIES-MERGING offers an efficient and effective means for combining fine-tuned models into a single multitask model. It successfully mitigates task interference and can be applied with fixed hyperparameters, providing a promising path forward for model management. This approach expands the scope for practical multitask model deployment, laying the groundwork for future advancements in efficient model merging practices.

Get summaries of trending AI papers delivered straight to your inbox

Unsubscribe anytime.

YouTube
Test Your Knowledge

You answered out of questions correctly.

Well done!