- The paper presents a novel fine-tuning strategy named Model Stock that achieves high performance using only two fine-tuned models.
- It leverages geometric properties of weight space by averaging layer-wise outputs based solely on the angle between model weights, eliminating extra tuning.
- Experiments demonstrate that this approach attains competitive benchmarks, including 87.8% ImageNet top-1 accuracy and enhanced OOD performance.
Model Stock: Achieving Superior Model Performance with Minimal Fine-Tuning
Introduction to Model Stock
In recent advancements in deep learning, the pre-train and fine-tune paradigm has become a cornerstone for achieving high-performing models across various tasks. However, the traditional methodology often relies on an ensemble of fine-tuned models to reach peak performance, incurring excessive computational costs. This paper introduces Model Stock, a novel fine-tuning method that significantly reduces the need for numerous fine-tuned models while enhancing both in-distribution (ID) and out-of-distribution (OOD) performance. By exploiting the geometric properties within the weight space of fine-tuned models, Model Stock efficiently approximates an optimal weight center with as few as two fine-tuned models.
Geometric Insights into Fine-tuned Weights
The initial exploration into the dynamics of fine-tuned weights uncovers two pivotal insights: fine-tuned weights with different random seeds tend to reside on a very thin shell layer-wise in weight space, and their proximity to the center of this shell is strongly linked to improved performance. Empirical observations and analyses reveal that the angle and norm of fine-tuned weights exhibit remarkably consistent values across layers and different configurations, suggesting a Gaussian-like distribution in the weight space. Further investigations demonstrate that models averaged closer to this weight center outperform those that are farther away, corroborating the importance of center-proximity for optimal model performance.
Model Stock Methodology
Leveraging these insights, Model Stock employs a simple yet effective layer-wise weight averaging technique, significantly outperforming state-of-the-art methods with only two fine-tuned models. It deviates from traditional model averaging practices by introducing an innovative approach that uses the pre-trained model weight as an anchor point. The method calculates an optimal interpolation ratio solely based on the angle between fine-tuned models, eliminating the need for additional fine-tuning or heuristic parameter settings. This approach not only streamlines the model optimization process but also underscores the practicability of Model Stock in achieving superior performance with minimal additional computational demands.
Experimental Validation and Implications
Extensive experiments validate the efficacy of Model Stock, demonstrating comparable or superior performance to more resource-intensive methods like Model Soup, across standard benchmarks for both ID and OOD tasks. The strategy showcases remarkable performance improvements, especially highlighted by achieving 87.8\% ImageNet top-1 accuracy and averaged 74.9\% across five distribution shift benchmarks with the ViT-L/14 model. These results indicate the potential of Model Stock to redefine efficiency and effectiveness within the pre-train/fine-tune paradigm, offering insights into future developments in model optimization techniques.
Conclusion and Future Directions
Model Stock introduces a groundbreaking perspective on maximizing model performance through fine-tuning, paving the way for more efficient and effective model optimization strategies. Its ability to achieve state-of-the-art performance with a fraction of the computational cost poses significant implications for both practical applications and theoretical understanding of model fine-tuning. Future research could explore the extension of this methodology to a broader range of models and tasks, further advancing the frontier of efficient machine learning practices.