Towards Efficient Visual Adaption via Structural Re-parameterization (2302.08106v2)

Published 16 Feb 2023 in cs.CV

Abstract: Parameter-efficient transfer learning (PETL) is an emerging research spot aimed at inexpensively adapting large-scale pre-trained models to downstream tasks. Recent advances have achieved great success in saving storage costs for various pre-trained models by updating a small number of parameters instead of full tuning. However, we notice that most existing PETL methods still incur non-negligible latency during inference. In this paper, we propose a parameter-efficient and computational friendly adapter for giant vision models, called RepAdapter. Specifically, we first prove that common adaptation modules can also be seamlessly integrated into most giant vision models via our structural re-parameterization, thereby achieving zero-cost during inference. We then investigate the sparse design and effective placement of adapter structure, helping our RepAdaper obtain other advantages in terms of parameter efficiency and performance. To validate RepAdapter, we conduct extensive experiments on 27 benchmark datasets of three vision tasks, i.e., image and video classifications and semantic segmentation. Experimental results show the superior performance and efficiency of RepAdapter than the state-of-the-art PETL methods. For instance, RepAdapter outperforms full tuning by +7.2% on average and saves up to 25% training time, 20% GPU memory, and 94.6% storage cost of ViT-B/16 on VTAB-1k. The generalization ability of RepAdapter is also well validated by a bunch of vision models. Our source code is released at https://github.com/luogen1996/RepAdapter.

PDF Abstract

Towards Efficient Visual Adaptation via Structural Re-parameterization

The paper "Towards Efficient Visual Adaptation via Structural Re-parameterization" introduces a novel approach to parameter-efficient transfer learning (PETL) in the domain of visual models, specifically aimed at optimizing the adaptation of large-scale pre-trained vision models to downstream tasks. The core contribution of this work lies in the development of RepAdapter, a parameter-efficient and computationally friendly adapter for giant vision models.

The authors critique the existing PETL methods, noting that many still incur substantial inference latency despite reducing the tuning costs by updating only a small number of parameters. To address this, the authors propose RepAdapter, which leverages structural re-parameterization to integrate adaptation modules into vision models without incurring inference costs.

RepAdapter is carefully designed to offer efficient inference without the computational overhead typically associated with parameter tuning. The model demonstrates superior performance and efficiency across a comprehensive suite of 27 benchmark datasets encompassing image and video classifications, as well as semantic segmentation. Notably, RepAdapter surpasses full tuning methods by an average of 7.2% on VTAB-1K datasets while also reducing training time by up to 25%, saving 20% in GPU memory, and cutting storage costs by 94.6% for the ViT-B/16 model. Such results indicate its potential for widespread adoption in resource-constrained applications.

Key to RepAdapter’s architecture are two strategic adaptations: firstly, the sparse design, which enhances parameter efficiency, and secondly, the effective placement of adapter structures within the model, leading to improved performance. The paper hypothesizes that structural re-parameterization can enhance intrinsic network capacity without incurring additional computational burdens, a concept verified through their empirical findings.

Comparison against state-of-the-art PETL methods further highlights the robustness and superiority of RepAdapter, which systematically outperforms existing techniques both quantitatively and qualitatively. Through experiments, RepAdapter demonstrates not only the potential for reducing resource requirements but also exhibits improved generalization when applied to a range of vision models, including ConvNeXt, ViT, Swin-Transformer, and CLIP.

The authors provide a thorough evaluation of RepAdapter in few-shot learning and domain adaptation scenarios, suggesting its robustness extends beyond typical parameter-efficient methodologies. The results underscore the transformative potential of RepAdapter in practical applications such as smart devices where computational resources and energy consumption are limited.

Moreover, the research discusses the theoretical and practical implications of RepAdapter's structural re-parameterization framework. The findings suggest promising avenues for further exploration in model compression and the enhancement of transfer learning methodologies across various AI-based applications. By presenting an open-source implementation, the paper encourages future research and adoption of this approach, potentially influencing the future development of efficient neural architectures.

This work contributes significantly to the existing literature by demonstrating how structural re-parameterization can serve as a powerful tool in creating adaptable and efficient vision models. As AI continues to permeate various fields, methods like RepAdapter that offer significant reductions in computational overhead without sacrificing performance are likely to play a critical role in the evolution of intelligent systems.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Gen Luo (32 papers)
Minglang Huang (2 papers)
Yiyi Zhou (38 papers)
Xiaoshuai Sun (91 papers)
Guannan Jiang (24 papers)
Zhiyu Wang (57 papers)
Rongrong Ji (315 papers)

Citations (63)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - luogen1996/RepAdapter: Official implementation of "Towards Efficient Visual Adaption via Structural Re-parameterization". (183 stars)