No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces (2502.04959v3)

Published 7 Feb 2025 in cs.LG

Abstract: Model merging integrates the weights of multiple task-specific models into a single multi-task model. Despite recent interest in the problem, a significant performance gap between the combined and single-task models remains. In this paper, we investigate the key characteristics of task matrices -- weight update matrices applied to a pre-trained model -- that enable effective merging. We show that alignment between singular components of task-specific and merged matrices strongly correlates with performance improvement over the pre-trained model. Based on this, we propose an isotropic merging framework that flattens the singular value spectrum of task matrices, enhances alignment, and reduces the performance gap. Additionally, we incorporate both common and task-specific subspaces to further improve alignment and performance. Our proposed approach achieves state-of-the-art performance on vision and language tasks across various sets of tasks and model scales. This work advances the understanding of model merging dynamics, offering an effective methodology to merge models without requiring additional training. Code is available at https://github.com/danielm1405/iso-merging .

Authors (6)

Daniel Marczak (9 papers)
Simone Magistri (7 papers)
Sebastian Cygert (18 papers)
Bartłomiej Twardowski (37 papers)
Andrew D. Bagdanov (47 papers)
Joost van de Weijer (133 papers)

Summary

An Exploration of Isotropic Model Merging with Common and Task-Specific Subspaces

In the landscape of multi-task deep learning, a critical challenge is how to effectively merge models that are pretrained on different datasets for distinct tasks. The paper titled "No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces" addresses this issue by proposing a novel isotropic merging framework that aims to minimize the performance gap between individual task-specific models and the merged multi-task model. This paper introduces a methodological advancement in isotropic model merging, highlighting its ability to enhance model performance without additional training requirements.

The core proposition of the paper revolves around analyzing and exploiting the subspace characteristics of task matrices—weight update matrices applied to a pretrained model. The paper identifies that the alignment between singular components of these task matrices and the merged matrices correlates strongly with the performance improvements over the pretrained model. This insight drives the development of two new merging techniques: Isotropic Merging in Common Subspace (Iso-C) and Isotropic Merging in Common and Task-Specific Subspaces (Iso-CTS).

Isotropic Merging Frameworks and Methodologies

Iso-C: The Iso-C framework aims to flatten the singular value spectrum of task matrices to enhance the subspace alignment. This is achieved through isotropic scaling, which normalizes the spectrum and facilitates a better alignment between task-specific and merged matrices. The mathematical foundation here leverages Singular Value Decomposition (SVD) to distill common terms reflected in the task matrices, leading to significant performance gains by effectively integrating diverse task-specific knowledge into a common subspace.
Iso-CTS: While Iso-C effectively manages tasks with significant overlap, Iso-CTS extends this technique by incorporating task-specific subspaces. This strategy retains critical task-specific features alongside shared knowledge, offering improved alignment, especially important when dealing with an increased number of tasks. Iso-CTS iteratively replaces less significant singular components with task-specific directions, thereby further enhancing cross-task performance by accounting for unique task nuances while still maintaining a balanced favorable common subspace.

Empirical Evaluation and Performance

The authors conduct extensive empirical evaluations across a variety of visual encoder models such as ViT-B/32, ViT-B/16, and ViT-L/14, using benchmarks that expand from 8 to 20 tasks. The models fine-tuned with their approach achieve state-of-the-art results, surpassing previous methods like Task Arithmetic, TIES, Consensus Merging, and TSV-M by a notable margin. Iso-CTS, in particular, demonstrates the greatest robustness and effectiveness, delivering enhanced performance even as the complexity and number of tasks scale up.

A considerable part of the research underscores how isotropic merging leverages the strong alignment between task and merged subspaces to substantially improve multi-task model performance. Key experimental results illustrate that improved alignment—measured using the Subspace Alignment Ratio (SAR)—translates directly to improved task performance, as quantified by Normalized Accuracy Improvement (NAI).

Implications and Future Directions

This paper provides a new lens through which model merging can be viewed, advocating for the enhancement of model subspace alignment as a key to bridging performance gaps in multi-task models. The implications of such work are both practical and theoretical, promising improved methodologies in various applications where models must efficiently operate across multiple tasks.

From a research perspective, future work could explore further optimization of the common subspace identification process or trial the application of these methods to other domains beyond visual tasks, including natural language processing and broader AI challenges, thereby broadening the scope and utility of the proposed techniques in isotropic model merging.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/danie1marczak/status/1888961710838472991

https://twitter.com/semisance/status/1888869936971346215

https://twitter.com/rohanpaul_ai/status/1891407116205695436

https://twitter.com/arXivGPT/status/1889375270152577471