ZipIt! Merging Models from Different Tasks without Training (2305.03053v3)

Published 4 May 2023 in cs.CV and cs.LG

Abstract: Typical deep visual recognition models are capable of performing the one task they were trained on. In this paper, we tackle the extremely difficult problem of combining distinct models with different initializations, each solving a separate task, into one multi-task model without any additional training. Prior work in model merging permutes one model to the space of the other then averages them together. While this works for models trained on the same task, we find that this fails to account for the differences in models trained on disjoint tasks. Thus, we introduce "ZipIt!", a general method for merging two arbitrary models of the same architecture that incorporates two simple strategies. First, in order to account for features that aren't shared between models, we expand the model merging problem to allow for merging features within each model by defining a general "zip" operation. Second, we add support for partially zipping the models up until a specified layer, naturally creating a multi-head model. We find that these two changes combined account for 20-60% improvement over prior work, making it more feasible to merge models trained on disjoint tasks without retraining.

Citations (83)

View on Semantic Scholar

Summary

The paper introduces ZipIt!, a novel model merging strategy that leverages feature merging and partial zipping to combine models from diverse tasks without retraining.
It demonstrates up to a 60% performance improvement over traditional methods, validated through rigorous experiments on datasets like CIFAR and ImageNet.
The approach formalizes intra-model redundancies and linear mode connectivity, paving the way for versatile multi-task learning and broader AI applications.

Insights on ZipIt! for Model Merging

The research paper introduces a methodology termed "ZipIt!" to merge deep learning models trained on separate tasks into a singular model capable of addressing multiple tasks without retraining. Traditionally, merging models was confined to those trained on identical tasks, where substantial accuracy declines were observed when applied to models trained on distinct tasks due to inadequate handling of task-specific differences in prior works like Git Re-Basin and REPAIR. The paper attempts to resolve these inefficiencies by exploiting a novel feature merging technique that leverages model redundancies both within and across constituent models.

Key Methodological Innovations

Feature Merging: ZipIt! introduces a generalized “zip” operation that allows feature merging both within individual models and across different models, thereby enhancing flexibility and accuracy in model merging when dealing with disparate tasks. This merging capability extends beyond the permutation-based approaches that were previously restricted to direct feature matching across models, disregarding intra-model redundancies.
Partial Zipping: The paper proposes a partial zipping notion for model merging, where merging can be stopped at a specified layer, allowing the formation of a multi-head model. This flexibility permits different levels of integration, enhancing the model’s capability to tackle various tasks effectively by maintaining some parts of models and networks distinct while sharing others—a feature that significantly improves joint accuracy compared to complete model merging.

Empirical Evaluations

ZipIt! showcases up to a 60% improvement in merging performance over earlier methodologies, particularly in managing models trained on entirely disjoint dataset categories, such as subsets from CIFAR and ImageNet. The improvements are quantified and justified through extensive experiments conducted over different architectures and tasks, demonstrating its superior utility in scenarios where traditional model merging techniques falter. Specifically, experiments with CIFAR-10 and CIFAR-100 (with subsets of categories distributed between two models) share compelling evidence of ZipIt!’s effectiveness in comparison to baseline methods (such as Git Re-Basin and direct averaging).

Theoretical Implications

ZipIt! theoretically formalizes intra-model redundancies and their relation to linear mode connectivity to explain observed empirical advantages. It posits that models exhibit inherent redundancy, offering scope for merging within, and this capability provides a tighter bound on merge quality compared to permutation-based models.

Future Directions and Speculations

The innovative strategies presented by ZipIt! indicate promising future applications in AI beyond vision recognition tasks. The comprehensive model merging capabilities suggest potential developments within various machine learning applications requiring multi-task handling, optimization in federated learning scenarios, and incremental learning environments. Further research might extend this methodology into domains requiring seamless integration across differing feature spaces or tackling diverging model architectures, aiming for improved generalization across tasks without initiating retraining phases.

This paper represents a significant step toward advancing model integration strategies within deep learning paradigms, providing pathways for constructing versatile AI systems capable of multi-task learning with enhanced efficiency and minimal computational overhead.

PDF Markdown

Related Papers

Tweets

https://twitter.com/jd_pressman/status/1753179387112063081

https://twitter.com/aashiq/status/1770923506621907165

https://twitter.com/pratiksiyal/status/1788042442261811347

https://twitter.com/LChoshen/status/1779885886873784397