TIES-Merging: Resolving Interference When Merging Models (2306.01708v2)

Published 2 Jun 2023 in cs.LG, cs.AI, cs.CL, and cs.CV

Abstract: Transfer learning - i.e., further fine-tuning a pre-trained model on a downstream task - can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency. These advantages have led to a proliferation of task-specific fine-tuned models, which typically can only perform a single task and do not benefit from one another. Recently, model merging techniques have emerged as a solution to combine multiple task-specific models into a single multitask model without performing additional training. However, existing merging methods often ignore the interference between parameters of different models, resulting in large performance drops when merging multiple models. In this paper, we demonstrate that prior merging techniques inadvertently lose valuable information due to two major sources of interference: (a) interference due to redundant parameter values and (b) disagreement on the sign of a given parameter's values across models. To address this, we propose our method, TRIM, ELECT SIGN & MERGE (TIES-Merging), which introduces three novel steps when merging models: (1) resetting parameters that only changed a small amount during fine-tuning, (2) resolving sign conflicts, and (3) merging only the parameters that are in alignment with the final agreed-upon sign. We find that TIES-Merging outperforms several existing methods in diverse settings covering a range of modalities, domains, number of tasks, model sizes, architectures, and fine-tuning settings. We further analyze the impact of different types of interference on model parameters, and highlight the importance of resolving sign interference. Our code is available at https://github.com/prateeky2806/ties-merging

Citations (150)

View on Semantic Scholar

Summary

The paper introduces TIES-Merging, a three-step method that trims redundant updates, elects parameter signs, and merges parameters to resolve model interference.
It demonstrates improved multitask and out-of-domain performance across diverse models and applications in NLP and computer vision.
Empirical results show that accurate sign resolution nearly matches joint task training while significantly reducing computational overhead.

Introduction to Model Merging and its Challenges

Transfer learning has been instrumental in enhancing AI model performance, especially when harnessing the power of pre-trained models (PTMs) in specific applications. By fine-tuning PTMs on downstream tasks, data scientists have obtained tailored models that offer improved accuracy, convergence speed, and sample efficiency. A flux of fine-tuned models, however, presents challenges like model storage, deployment, and managing a sprawling number of variations. Multitask learning proposes an elegant solution, but it is computationally expensive and requires all task data simultaneously which may not always be feasible.

Model merging has emerged as a practical alternative, creating a multitask model by combining multiple task-specific models without further training. While this saves computational resources, current merging methods often overlook how the parameters of different models might interfere with each other when combined. This oversight can lead to degraded performance due to two primary types of interference: one resulting from redundant parameter values and the other from sign disagreements across models.

Addressing Interference with TIES-MERGING

To counter these issues, a novel method known as TIES-MERGING (TRIM, ELECT SIGN & MERGE) introduces a three-step approach to model merging. The approach begins by trimming task vectors which represent parameter updates during fine-tuning, thus removing redundant parameter changes that do not significantly impact performance. Next, the method elects signs for each parameter by resolving disagreements between models—instead of averaging conflicting updates, it chooses the direction of the parameter change that reflects the total magnitude of the updates across models. The final step is to only merge parameter updates which have the elected sign, hence dubbed "disjoint merging."

Empirical Validation across Diverse Scenarios

The effectiveness of TIES-MERGING is established across numerous scenarios, encompassing different model sizes, architectures, fine-tuning approaches, and domains such as NLP and computer vision. The method outperforms several existing merging techniques, not only in specific domain tasks but also in out-of-domain generalization tests.

Insights into Model Parameters and Signs

Analyses further reveal the impact of interference on model parameters and underscore the significance of correctly resolving sign interference. Experiments show that accurate resolution of parameter signs can nearly reach the performance of a model trained on combined tasks simultaneously, thus potentially closing the gap between merging and multitask training.

Conclusion and Future Directions

TIES-MERGING offers an efficient and effective means for combining fine-tuned models into a single multitask model. It successfully mitigates task interference and can be applied with fixed hyperparameters, providing a promising path forward for model management. This approach expands the scope for practical multitask model deployment, laying the groundwork for future advancements in efficient model merging practices.

PDF Markdown

Related Papers

GitHub

GitHub - prateeky2806/ties-merging (183 stars)

Tweets

https://twitter.com/sourab_m/status/1745216460476350845

https://twitter.com/s_scardapane/status/1754147590621245572

https://twitter.com/1006797311593377792/status/1735663132427628600

https://twitter.com/Ar_Douillard/status/1879646896530084272

https://twitter.com/danie1marczak/status/1780341603883798995

https://twitter.com/knishimae0531/status/1754338666716315658

YouTube

Show All Videos