When Layers Play the Lottery, all Tickets Win at Initialization (2301.10835v2)
Abstract: Pruning is a standard technique for reducing the computational cost of deep networks. Many advances in pruning leverage concepts from the Lottery Ticket Hypothesis (LTH). LTH reveals that inside a trained dense network exists sparse subnetworks (tickets) able to achieve similar accuracy (i.e., win the lottery - winning tickets). Pruning at initialization focuses on finding winning tickets without training a dense network. Studies on these concepts share the trend that subnetworks come from weight or filter pruning. In this work, we investigate LTH and pruning at initialization from the lens of layer pruning. First, we confirm the existence of winning tickets when the pruning process removes layers. Leveraged by this observation, we propose to discover these winning tickets at initialization, eliminating the requirement of heavy computational resources for training the initial (over-parameterized) dense network. Extensive experiments show that our winning tickets notably speed up the training phase and reduce up to 51% of carbon emission, an important step towards democratization and green Artificial Intelligence. Beyond computational benefits, our winning tickets exhibit robustness against adversarial and out-of-distribution examples. Finally, we show that our subnetworks easily win the lottery at initialization while tickets from filter removal (the standard structured LTH) hardly become winning tickets.
- Critical learning periods in deep networks. In International Conference on Learning Representations (ICLR), 2019.
- What is the state of neural network pruning? In Conference on Machine Learning and Systems (MLSys), 2020.
- The lottery ticket hypothesis for pre-trained BERT networks. In Neural Information Processing Systems (NeurIPS), 2020.
- Sparsity winning twice: Better robust generalization from more efficient training. In International Conference on Learning Representations (ICLR), 2022.
- Progressive skeletonization: Trimming more fat from a network at initialization. In International Conference on Learning Representations (ICLR), 2021.
- A winning hand: Compressing deep networks can improve out-of-distribution robustness. In Neural Information Processing Systems (NeurIPS), 2021.
- Attention is not all you need: pure attention loses rank doubly exponentially with depth. In International Conference on Machine Learning (ICML), 2021.
- Reducing transformer depth on demand with structured dropout. In International Conference on Learning Representations (ICLR), 2020.
- The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations (ICLR), 2019.
- Linear mode connectivity and the lottery ticket hypothesis. In International Conference on Machine Learning (ICML), 2020.
- Pruning neural networks at initialization: Why are we missing the mark? In International Conference on Learning Representations (ICLR), 2021.
- EIE: efficient inference engine on compressed deep neural network. In International Symposium on Computer Architecture (ISCA), 2016.
- Dynamic neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- Deep residual learning for image recognition. In Computer Vision and Pattern Recognition (CVPR), 2016.
- Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations (ICLR), 2019.
- Deep networks with stochastic depth. In European Conference on Computer Vision (ECCV), 2016.
- Deep network compression based on partial least squares. Neurocomputing, 2020.
- Similarity of neural network representations revisited. In International Conference on Machine Learning (ICML), 2019.
- Cifar-10 (canadian institute for advanced research). 2009.
- Quantifying the carbon emissions of machine learning. In NeurIPS. 2019.
- Snip: single-shot network pruning based on connection sensitivity. In International Conference on Learning Representations (ICLR), 2019.
- NPAS: A compiler-aware framework of unified network pruning and architecture search for beyond real-time mobile acceleration. In Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Hrank: Filter pruning using high-rank feature map. In Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Rethinking the value of network pruning. In International Conference on Learning Representations (ICLR), 2019.
- Lottery ticket preserves weight correlation: Is it desirable or not? In International Conference on Machine Learning (ICML), 2021.
- The unreasonable effectiveness of random pruning: Return of the most naive baseline for sparse training. In International Conference on Learning Representations (ICLR), 2022.
- Harder or different? a closer look at distribution shift in dataset reproduction. In International Conference on Machine Learning (ICML) Workshop on Uncertainty and Robustness in Deep Learning, 2020.
- Neural network pruning with residual-connections and limited-data. In Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
- Patdnn: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning. In Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2020.
- Lottery tickets on a data diet: Finding initializations with sparse trainable networks. In Neural Information Processing Systems (NeurIPS), 2022.
- When BERT plays the lottery, all tickets are winning. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020.
- Comparing rewinding and fine-tuning in neural network pruning. In International Conference on Learning Representations (ICLR), 2020.
- Energy and policy considerations for deep learning in NLP. In ACL, 2019.
- Sparse winning tickets are data-efficient image recognizers. In Neural Information Processing Systems (NeurIPS), 2022.
- Chong Min John Tan and Mehul Motani. Dropnet: Reducing neural network complexity via iterative pruning. In International Conference on International Conference on Machine Learning (ICML). 2020.
- Pruning neural networks without any data by iteratively conserving synaptic flow. In Neural Information Processing Systems (NeurIPS), 2020.
- Attention is all you need. In Neural Information Processing Systems (NeurIPS), 2017.
- Convolutional networks with adaptive inference graphs. International Journal of Computer Vision (IJCV), 128:730–741, 2020.
- Residual networks behave like ensembles of relatively shallow networks. In Neural Information Processing Systems (NeurIPS), 2016.
- Picking winning tickets before training by preserving gradient flow. In International Conference on Learning Representations (ICLR), 2020.
- Drawing early-bird tickets: Toward more efficient training of deep networks. In International Conference on Learning Representations (ICLR), 2020.
- Ke Zhang and Guangzhe Liu. Layer pruning for obtaining shallower resnets. IEEE Signal Processing Letters, 2022.
- Learning N: M fine-grained structured sparse neural networks from scratch. In International Conference on Learning Representations (ICLR), 2021.
- Evolutionary shallowing deep neural networks at block levels. IEEE Transactions on Neural Networks and Learning Systems, 2022.
- Artur Jordao (13 papers)
- George Correa de Araujo (2 papers)
- Helena de Almeida Maia (1 paper)
- Helio Pedrini (30 papers)