Automated Progressive Learning for Efficient Training of Vision Transformers (2203.14509v1)

Published 28 Mar 2022 in cs.CV, cs.AI, and cs.LG

Abstract: Recent advances in vision Transformers (ViTs) have come with a voracious appetite for computing power, high-lighting the urgent need to develop efficient training methods for ViTs. Progressive learning, a training scheme where the model capacity grows progressively during training, has started showing its ability in efficient training. In this paper, we take a practical step towards efficient training of ViTs by customizing and automating progressive learning. First, we develop a strong manual baseline for progressive learning of ViTs, by introducing momentum growth (MoGrow) to bridge the gap brought by model growth. Then, we propose automated progressive learning (AutoProg), an efficient training scheme that aims to achieve lossless acceleration by automatically increasing the training overload on-the-fly; this is achieved by adaptively deciding whether, where and how much should the model grow during progressive learning. Specifically, we first relax the optimization of the growth schedule to sub-network architecture optimization problem, then propose one-shot estimation of the sub-network performance via an elastic supernet. The searching overhead is reduced to minimal by recycling the parameters of the supernet. Extensive experiments of efficient training on ImageNet with two representative ViT models, DeiT and VOLO, demonstrate that AutoProg can accelerate ViTs training by up to 85.1% with no performance drop. Code: https://github.com/changlin31/AutoProg

PDF Abstract

Automated Progressive Learning for Efficient Training of Vision Transformers

In tackling the computational demands associated with the training of Vision Transformers (ViTs), the paper "Automated Progressive Learning for Efficient Training of Vision Transformers" introduces a methodology that automates and optimizes the progressive learning process. The authors propose a novel technique—Automated Progressive Learning (AutoProg)—to enhance the efficiency of training ViTs by incrementally growing model capacities. This strategy addresses the growing need for sustainable computing practices given ViTs' substantial training demands on state-of-the-art datasets like ImageNet.

Summary of Contributions

The paper outlines significant contributions in the domain of efficient training of deep learning models, specifically ViTs:

Manual and Automated Progressive Learning:
- The authors establish a reliable manual baseline for ViTs' progressive learning, highlighting the importance of a designed approach like Momentum Growth (MoGrow) to mitigate disruptions from model growth.
- AutoProg is introduced as a mechanism to dynamically search and determine optimal growth schedules during model training, optimizing both the location and timing of model scaling to minimize computational expenses.
Elastic Supernet for Search Optimization:
- A unique feature of AutoProg is the use of an Elastic Supernet, which effectively organizes sub-network structures and adapts learning parameters across training stages to estimate sub-network performance. This allows for efficient parameter sharing and ensures minimal retraining overhead.
Experimental Validation:
- The methodology demonstrates an impressive up to 85.1% acceleration in training time on models like VOLO-D1 without any significant drop in performance. Comparative analyses across architectures and training schedules underline AutoProg's adaptability and robust performance.
Implications for Broader Applications:
- While designed primarily for ViTs, this framework has potential implications for efficiently training other model architectures, such as Convolutional Neural Networks (CNNs), offering a generalized approach to resource-intensive neural architectures across multiple disciplines.

Implications and Future Directions

The work presents both theoretical and practical implications underlining the necessity of fostering sustainable AI technologies. The innovative use of automated learning schedules in AutoProg not only conserves computational resources by leveraging smaller model configurations during early training phases but also aligns with ecological considerations by mitigating associated carbon footprints. Additionally, this paper opens opportunities for further exploration into the integration of adaptive learning parameters across numerous model types beyond ViTs.

Potential future developments could focus on:

Extending Automated Progressive Learning:

Future research could explore whether AutoProg can be generalized to other massive models, such as language transformers and generative networks, adjusting for different task-specific objectives and architectural nuances.

Incorporating Fine-Grained Optimization:

Further refinement in the automated selection of sub-network parameters, considering finer model characteristics like transformer head numbers or intermediate embedding dimensions, could advance the fidelity and applicability of AutoProg.

Exploring Hybrid Architectures:

With ViTs and CNN integrations becoming increasingly prevalent, studying AutoProg's applicability within hybrid models may offer enhanced performance and efficiency benefits.

Overall, the paper provides meaningful insights into optimizing deep learning training schemes, balancing model accuracy, and computational efficiency, thus contributing to the development of environmentally sustainable neural network innovations.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Changlin Li (28 papers)
Bohan Zhuang (79 papers)
Guangrun Wang (43 papers)
Xiaodan Liang (318 papers)
Xiaojun Chang (148 papers)
Yi Yang (856 papers)

Citations (39)

View on Semantic Scholar

Automated Progressive Learning for Efficient Training of Vision Transformers (2203.14509v1)