Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning (2210.08823v3)

Published 17 Oct 2022 in cs.CV

Abstract: Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning), which is not efficient, or only tune the last linear layer (linear probing), which suffers a significant accuracy drop compared to the full fine-tuning. In this paper, we propose a new parameter-efficient fine-tuning method termed as SSF, representing that researchers only need to Scale and Shift the deep Features extracted by a pre-trained model to catch up with the performance of full fine-tuning. In this way, SSF also surprisingly outperforms other parameter-efficient fine-tuning approaches even with a smaller number of tunable parameters. Furthermore, different from some existing parameter-efficient fine-tuning methods (e.g., Adapter or VPT) that introduce the extra parameters and computational cost in the training and inference stages, SSF only adds learnable parameters during the training stage, and these additional parameters can be merged into the original pre-trained model weights via re-parameterization in the inference phase. With the proposed SSF, our model obtains 2.46% (90.72% vs. 88.54%) and 11.48% (73.10% vs. 65.57%) performance improvement on FGVC and VTAB-1k in terms of Top-1 accuracy compared to the full fine-tuning but only fine-tuning about 0.3M parameters. We also conduct amounts of experiments in various model families (CNNs, Transformers, and MLPs) and datasets. Results on 26 image classification datasets in total and 3 robustness & out-of-distribution datasets show the effectiveness of SSF. Code is available at https://github.com/dongzelian/SSF.

Authors (4)

Dongze Lian (19 papers)
Daquan Zhou (47 papers)
Jiashi Feng (295 papers)
Xinchao Wang (203 papers)

Citations (204)

View on Semantic Scholar

Summary

The paper presents the SSF method, which adjusts feature scales and shifts to balance performance and computational efficiency.
It utilizes only about 0.3M tunable parameters while outperforming alternatives like Adapter and VPT with significant accuracy gains.
The method employs a re-parameterization strategy that absorbs learnable parameters during training, eliminating extra inference costs.

An Analysis of Scaling and Shifting Features for Efficient Model Tuning

The paper presents a novel fine-tuning method, termed Scaling and Shifting Features (SSF), which is designed to address the limitations of existing parameter-efficient fine-tuning techniques. This method proposes a pragmatic approach to modulating deep features extracted by pre-trained models. The core objective of SSF is to achieve a fine balance between performance and computational efficiency by leveraging the minimal number of learnable parameters for model adaptation.

Key Contributions

Innovative Fine-Tuning Approach: SSF aims to surmount the inherent trade-offs associated with existing fine-tuning methods, such as full fine-tuning—requiring updates to all model parameters—and linear probing—resulting in a substantial accuracy dip. SSF simplifies the process to merely adjusting the scales and shifts of deep features, thus leading to near-full fine-tuning performance with significantly fewer tunable parameters.
Parameter Efficiency: The method integrates a minimal number of trainable parameters—about 0.3 million—compared to large-scale models like ViT-G/14 and CoAtNet. This is achieved by focusing on modulation of features rather than the model weights themselves, which allows SSF to outpace other parameter-efficient techniques like Adapter and VPT while maintaining zero additional inference parameters and computational costs.
Performance Gains: The empirical evaluations indicate that using SSF results in prominent improvements in Top-1 accuracy—2.46% and 11.48% on FGVC and VTAB-1k datasets, respectively—over full fine-tuning. This underscores the method’s efficacy in extracting comparable performance from the model with reduced parameter updates.
Re-Parameterization Strategy: The incorporation of learnable parameters in the training phase can be absorbed into the pre-trained model via re-parameterization, eliminating additional computational overhead during inference. This attribute suggests potential for deploying SSF in edge devices where computational resources are constrained.

Implications and Future Directions

The implications of SSF extend to both theoretical and practical realms within model tuning strategies. Theoretically, this approach challenges prevailing paradigms by ensuring that model performance doesn't inherently need to rely on increasing parameter complexity or computational demand. Practically, SSF lays the groundwork for efficient model adaptation, especially valuable in scenarios demanding rapid deployment on heterogeneous data distributions and environments with stringent resource limitations.

The notion of modulating features, rather than the architecture, opens the floor for exploring adaptive techniques that could dynamically engage with task-specific characteristics. Future research could benefit from synergy with task-agnostic exploration methodologies, potentially integrating task-specific modulation as a core concept. Additionally, investigating the interplay between SSF and task-interdependencies, perhaps advancing into multi-task learning spheres, presents significant potential.

Conclusion

In summary, the SSF method offers a compelling framework for parameter-efficient model tuning, deftly balancing performance and computational thrift. By innovatively scaling and shifting deep features, it repositions focus in the fine-tuning narrative towards more efficient and practical application, rendering it a valuable methodology for AI researchers and practitioners alike. As the AI field continues to probe deeper into efficient learning methods, SSF’s approach will likely inspire novel directions in the synthesis of learning models and their subsequent deployment across diverse, real-world scenarios.

PDF Markdown

Related Papers

GitHub

GitHub - dongzelian/SSF: [NeurIPS'22] This is an official implementation for "Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning". (182 stars)