Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Model Stock: All we need is just a few fine-tuned models (2403.19522v1)

Published 28 Mar 2024 in cs.LG and cs.CV

Abstract: This paper introduces an efficient fine-tuning method for large pre-trained models, offering strong in-distribution (ID) and out-of-distribution (OOD) performance. Breaking away from traditional practices that need a multitude of fine-tuned models for averaging, our approach employs significantly fewer models to achieve final weights yet yield superior accuracy. Drawing from key insights in the weight space of fine-tuned weights, we uncover a strong link between the performance and proximity to the center of weight space. Based on this, we introduce a method that approximates a center-close weight using only two fine-tuned models, applicable during or after training. Our innovative layer-wise weight averaging technique surpasses state-of-the-art model methods such as Model Soup, utilizing only two fine-tuned models. This strategy can be aptly coined Model Stock, highlighting its reliance on selecting a minimal number of models to draw a more optimized-averaged model. We demonstrate the efficacy of Model Stock with fine-tuned models based upon pre-trained CLIP architectures, achieving remarkable performance on both ID and OOD tasks on the standard benchmarks, all while barely bringing extra computational demands. Our code and pre-trained models are available at https://github.com/naver-ai/model-stock.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Dong-Hwan Jang (2 papers)
  2. Sangdoo Yun (71 papers)
  3. Dongyoon Han (50 papers)
Citations (23)

Summary

  • The paper presents a novel fine-tuning strategy named Model Stock that achieves high performance using only two fine-tuned models.
  • It leverages geometric properties of weight space by averaging layer-wise outputs based solely on the angle between model weights, eliminating extra tuning.
  • Experiments demonstrate that this approach attains competitive benchmarks, including 87.8% ImageNet top-1 accuracy and enhanced OOD performance.

Model Stock: Achieving Superior Model Performance with Minimal Fine-Tuning

Introduction to Model Stock

In recent advancements in deep learning, the pre-train and fine-tune paradigm has become a cornerstone for achieving high-performing models across various tasks. However, the traditional methodology often relies on an ensemble of fine-tuned models to reach peak performance, incurring excessive computational costs. This paper introduces Model Stock, a novel fine-tuning method that significantly reduces the need for numerous fine-tuned models while enhancing both in-distribution (ID) and out-of-distribution (OOD) performance. By exploiting the geometric properties within the weight space of fine-tuned models, Model Stock efficiently approximates an optimal weight center with as few as two fine-tuned models.

Geometric Insights into Fine-tuned Weights

The initial exploration into the dynamics of fine-tuned weights uncovers two pivotal insights: fine-tuned weights with different random seeds tend to reside on a very thin shell layer-wise in weight space, and their proximity to the center of this shell is strongly linked to improved performance. Empirical observations and analyses reveal that the angle and norm of fine-tuned weights exhibit remarkably consistent values across layers and different configurations, suggesting a Gaussian-like distribution in the weight space. Further investigations demonstrate that models averaged closer to this weight center outperform those that are farther away, corroborating the importance of center-proximity for optimal model performance.

Model Stock Methodology

Leveraging these insights, Model Stock employs a simple yet effective layer-wise weight averaging technique, significantly outperforming state-of-the-art methods with only two fine-tuned models. It deviates from traditional model averaging practices by introducing an innovative approach that uses the pre-trained model weight as an anchor point. The method calculates an optimal interpolation ratio solely based on the angle between fine-tuned models, eliminating the need for additional fine-tuning or heuristic parameter settings. This approach not only streamlines the model optimization process but also underscores the practicability of Model Stock in achieving superior performance with minimal additional computational demands.

Experimental Validation and Implications

Extensive experiments validate the efficacy of Model Stock, demonstrating comparable or superior performance to more resource-intensive methods like Model Soup, across standard benchmarks for both ID and OOD tasks. The strategy showcases remarkable performance improvements, especially highlighted by achieving 87.8\% ImageNet top-1 accuracy and averaged 74.9\% across five distribution shift benchmarks with the ViT-L/14 model. These results indicate the potential of Model Stock to redefine efficiency and effectiveness within the pre-train/fine-tune paradigm, offering insights into future developments in model optimization techniques.

Conclusion and Future Directions

Model Stock introduces a groundbreaking perspective on maximizing model performance through fine-tuning, paving the way for more efficient and effective model optimization strategies. Its ability to achieve state-of-the-art performance with a fraction of the computational cost poses significant implications for both practical applications and theoretical understanding of model fine-tuning. Future research could explore the extension of this methodology to a broader range of models and tasks, further advancing the frontier of efficient machine learning practices.

Youtube Logo Streamline Icon: https://streamlinehq.com