Drawing Early-Bird Tickets: Towards More Efficient Training of Deep Networks (1909.11957v6)

Published 26 Sep 2019 in cs.LG and stat.ML

Abstract: (Frankle & Carbin, 2019) shows that there exist winning tickets (small but critical subnetworks) for dense, randomly initialized networks, that can be trained alone to achieve comparable accuracies to the latter in a similar number of iterations. However, the identification of these winning tickets still requires the costly train-prune-retrain process, limiting their practical benefits. In this paper, we discover for the first time that the winning tickets can be identified at the very early training stage, which we term as early-bird (EB) tickets, via low-cost training schemes (e.g., early stopping and low-precision training) at large learning rates. Our finding of EB tickets is consistent with recently reported observations that the key connectivity patterns of neural networks emerge early. Furthermore, we propose a mask distance metric that can be used to identify EB tickets with low computational overhead, without needing to know the true winning tickets that emerge after the full training. Finally, we leverage the existence of EB tickets and the proposed mask distance to develop efficient training methods, which are achieved by first identifying EB tickets via low-cost schemes, and then continuing to train merely the EB tickets towards the target accuracy. Experiments based on various deep networks and datasets validate: 1) the existence of EB tickets, and the effectiveness of mask distance in efficiently identifying them; and 2) that the proposed efficient training via EB tickets can achieve up to 4.7x energy savings while maintaining comparable or even better accuracy, demonstrating a promising and easily adopted method for tackling cost-prohibitive deep network training. Code available at https://github.com/RICE-EIC/Early-Bird-Tickets.

Citations (242)

View on Semantic Scholar

Summary

The paper proposes identifying Early-Bird tickets early in training to bypass resource-intensive train-prune-retrain cycles.
It leverages a mask distance metric during low-cost training phases, using techniques like early stopping and low-precision computations.
Experiments across architectures show up to 10.7x energy savings while maintaining high accuracy, underscoring the method’s efficiency.

Efficient Training of Neural Networks via Early-Bird Tickets

The research paper "Drawing early-bird tickets: Towards more efficient training of deep networks" explores an innovative approach for the efficient training of deep neural networks (DNNs) by leveraging a concept known as Early-Bird (EB) tickets. The authors focus on reducing the computational and energy costs associated with training DNNs by identifying critical subnetworks, or tickets, early in the training process—eliminating the need for the traditional and costly train-prune-retrain cycles associated with finding "winning tickets."

Overview of Early-Bird Tickets

The paper builds upon the Lottery Ticket Hypothesis, which posits that within large, densely initialized networks, there exist small subnetworks (winning tickets) that can be trained in isolation to achieve comparable accuracy to the entire network. The authors aim to address the inefficiency inherent in this method due to its reliance on fully training and pruning a model before identifying winning tickets. They propose a method to identify these subnetworks at an early stage of training by utilizing low-cost training schemes such as early stopping and low-precision training.

To quantify the identification of EB tickets, the authors introduce a mask distance metric. This metric measures the distance between mask representations of subnetworks drawn at different stages of training, enabling the early determination of when a critical subnetwork stabilizes and can be utilized for further training. This metric allows for the identification of EB tickets without needing a fully trained model for comparison, thus achieving significant reductions in training time and energy consumption.

Experimental Evaluation

The research validates the existence of EB tickets through extensive experiments across different DNN architectures (VGG16, PreResNet101) and datasets (CIFAR-10, CIFAR-100). The findings demonstrate consistent emergence of EB tickets, capable of achieving high accuracy when retrained, even in aggressive training scenarios characterized by large learning rates and low-precision computations. Notably, EB tickets often outperform the 'ground-truth' winning tickets derived from fully trained models.

Furthermore, the experiments reveal significant computational savings: the proposed EB Train framework achieves up to a 5.8 to 10.7 times reduction in energy consumption while retaining comparable or superior accuracy relative to state-of-the-art methods. Such an improvement highlights the potential for EB tickets to transform the efficiency of DNN training, particularly in resource-constrained environments.

Implications and Future Prospects

The implications of these findings are multifaceted. From a practical perspective, the EB Train framework introduces a scalable and resource-efficient method for training large-scale networks, thus addressing the growing demand for deploying DNN-powered solutions across various applications. Theoretically, the notion of EB tickets enriches understanding of early-stage network dynamics and presents opportunities to further explore the optimization landscapes of neural networks.

Future developments might focus on refining the mechanisms for EB ticket detection, potentially integrating more sophisticated learning rate schedules or precision configurations. Further validation on larger datasets and architectures, such as those in industrial settings, could propel EB training methodologies towards widespread adoption. The adaptability of EB tickets to newer architectures and training paradigms will likely catalyze ongoing advancements in efficient AI training methods.

In conclusion, the research presents a compelling case for rethinking the traditional, resource-intensive approaches to DNN training. By harnessing the potential of EB tickets, the authors offer a pathway toward more sustainable and efficient AI development, catering to both current computational constraints and future technological advancements.

PDF Markdown

Related Papers

GitHub

GitHub - GATECH-EIC/Early-Bird-Tickets: [ICLR 2020] Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks (137 stars)