Emergent Mind

Model Stock: All we need is just a few fine-tuned models

(2403.19522)
Published Mar 28, 2024 in cs.LG and cs.CV

Abstract

This paper introduces an efficient fine-tuning method for large pre-trained models, offering strong in-distribution (ID) and out-of-distribution (OOD) performance. Breaking away from traditional practices that need a multitude of fine-tuned models for averaging, our approach employs significantly fewer models to achieve final weights yet yield superior accuracy. Drawing from key insights in the weight space of fine-tuned weights, we uncover a strong link between the performance and proximity to the center of weight space. Based on this, we introduce a method that approximates a center-close weight using only two fine-tuned models, applicable during or after training. Our innovative layer-wise weight averaging technique surpasses state-of-the-art model methods such as Model Soup, utilizing only two fine-tuned models. This strategy can be aptly coined Model Stock, highlighting its reliance on selecting a minimal number of models to draw a more optimized-averaged model. We demonstrate the efficacy of Model Stock with fine-tuned models based upon pre-trained CLIP architectures, achieving remarkable performance on both ID and OOD tasks on the standard benchmarks, all while barely bringing extra computational demands. Our code and pre-trained models are available at https://github.com/naver-ai/model-stock.

Model Stock outperforms fine-tuned models in accuracy on ImageNet and distribution shift benchmarks.

Overview

  • Model Stock introduces a novel fine-tuning method that significantly decreases the need for multiple fine-tuned models while enhancing model performance on both in-distribution (ID) and out-of-distribution (OOD) tasks.

  • The method is based on insights into the geometric properties of fine-tuned weights in weight space, where fine-tuned models with different initializations tend to cluster within a thin shell, and proximity to the center of this shell correlates with improved performance.

  • Model Stock uses a simple layer-wise weight averaging technique, leveraging the pre-trained model weight as an anchor point and calculating an optimal interpolation ratio based on the angle between fine-tuned models, which efficiently approximates an optimal weight center with minimal computational resources.

  • Experimental validation demonstrates that Model Stock achieves comparable or superior performance to more computationally intensive methods across standard benchmarks, indicating its potential to transform model optimization practices.

Model Stock: Achieving Superior Model Performance with Minimal Fine-Tuning

Introduction to Model Stock

In recent advancements in deep learning, the pre-train and fine-tune paradigm has become a cornerstone for achieving high-performing models across various tasks. However, the traditional methodology often relies on an ensemble of fine-tuned models to reach peak performance, incurring excessive computational costs. This paper introduces Model Stock, a novel fine-tuning method that significantly reduces the need for numerous fine-tuned models while enhancing both in-distribution (ID) and out-of-distribution (OOD) performance. By exploiting the geometric properties within the weight space of fine-tuned models, Model Stock efficiently approximates an optimal weight center with as few as two fine-tuned models.

Geometric Insights into Fine-tuned Weights

The initial exploration into the dynamics of fine-tuned weights uncovers two pivotal insights: fine-tuned weights with different random seeds tend to reside on a very thin shell layer-wise in weight space, and their proximity to the center of this shell is strongly linked to improved performance. Empirical observations and analyses reveal that the angle and norm of fine-tuned weights exhibit remarkably consistent values across layers and different configurations, suggesting a Gaussian-like distribution in the weight space. Further investigations demonstrate that models averaged closer to this weight center outperform those that are farther away, corroborating the importance of center-proximity for optimal model performance.

Model Stock Methodology

Leveraging these insights, Model Stock employs a simple yet effective layer-wise weight averaging technique, significantly outperforming state-of-the-art methods with only two fine-tuned models. It deviates from traditional model averaging practices by introducing an innovative approach that uses the pre-trained model weight as an anchor point. The method calculates an optimal interpolation ratio solely based on the angle between fine-tuned models, eliminating the need for additional fine-tuning or heuristic parameter settings. This approach not only streamlines the model optimization process but also underscores the practicability of Model Stock in achieving superior performance with minimal additional computational demands.

Experimental Validation and Implications

Extensive experiments validate the efficacy of Model Stock, demonstrating comparable or superior performance to more resource-intensive methods like Model Soup, across standard benchmarks for both ID and OOD tasks. The strategy showcases remarkable performance improvements, especially highlighted by achieving 87.8\% ImageNet top-1 accuracy and averaged 74.9\% across five distribution shift benchmarks with the ViT-L/14 model. These results indicate the potential of Model Stock to redefine efficiency and effectiveness within the pre-train/fine-tune paradigm, offering insights into future developments in model optimization techniques.

Conclusion and Future Directions

Model Stock introduces a groundbreaking perspective on maximizing model performance through fine-tuning, paving the way for more efficient and effective model optimization strategies. Its ability to achieve state-of-the-art performance with a fraction of the computational cost poses significant implications for both practical applications and theoretical understanding of model fine-tuning. Future research could explore the extension of this methodology to a broader range of models and tasks, further advancing the frontier of efficient machine learning practices.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

YouTube