Adaptive Sharpness-Aware Pruning for Robust Sparse Networks (2306.14306v2)
Abstract: Robustness and compactness are two essential attributes of deep learning models that are deployed in the real world. The goals of robustness and compactness may seem to be at odds, since robustness requires generalization across domains, while the process of compression exploits specificity in one domain. We introduce Adaptive Sharpness-Aware Pruning (AdaSAP), which unifies these goals through the lens of network sharpness. The AdaSAP method produces sparse networks that are robust to input variations which are unseen at training time. We achieve this by strategically incorporating weight perturbations in order to optimize the loss landscape. This allows the model to be both primed for pruning and regularized for improved robustness. AdaSAP improves the robust accuracy of pruned models on image classification by up to +6% on ImageNet C and +4% on ImageNet V2, and on object detection by +4% on a corrupted Pascal VOC dataset, over a wide range of compression ratios, pruning criteria, and network architectures, outperforming recent pruning art by large margins.
- Structured pruning of deep convolutional neural networks. ACM Journal on Emerging Technologies in Computing Systems (JETC), 13(3):1–18, 2017.
- Why do deep convolutional networks generalize so poorly to small image transformations? arXiv preprint arXiv:1805.12177, 2018.
- Scalable methods for 8-bit training of neural networks. Advances in neural information processing systems, 31, 2018.
- The generalization-stability tradeoff in neural network pruning. Advances in Neural Information Processing Systems, 33:20852–20864, 2020.
- Zeroq: A novel zero shot quantization framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13169–13178, 2020.
- Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016.
- Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM workshop on artificial intelligence and security, pages 3–14, 2017.
- Swad: Domain generalization by seeking flat minima. Advances in Neural Information Processing Systems, 34:22405–22418, 2021.
- Chamnet: Towards efficient network design through platform-aware model adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11398–11407, 2019.
- Progressive skeletonization: Trimming more fat from a network at initialization. arXiv preprint arXiv:2006.09081, 2020.
- ImageNet: A large-scale hierarchical image database. In CVPR, 2009.
- A winning hand: Compressing deep networks can improve out-of-distribution robustness. Advances in Neural Information Processing Systems, 34:664–676, 2021.
- Sharp minima can generalize for deep nets. In International Conference on Machine Learning, pages 1019–1028. PMLR, 2017.
- Sharpness-aware training for free. arXiv preprint arXiv:2205.14083, 2022.
- The pascal visual object classes (voc) challenge. IJCV, 88:303–308, 2009.
- Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412, 2020.
- The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635, 2018.
- Drawing robust scratch tickets: Subnetworks with inborn robustness are found within randomly initialized networks. Advances in Neural Information Processing Systems, 34:13059–13072, 2021.
- A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630, 2021.
- Expandnets: Linear over-parameterization to train compact convolutional networks. Advances in Neural Information Processing Systems, 33:1298–1310, 2020.
- Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
- Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
- Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV), pages 784–800, 2018.
- Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261, 2019.
- Using trusted data to train deep networks on labels corrupted by severe noise. Advances in neural information processing systems, 31, 2018.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Flat minima. Neural computation, 9(1):1–42, 1997.
- Towards improving robustness of compressed cnns. In ICML UDL Workshop, 2021.
- What do compressed deep neural networks forget? arXiv preprint arXiv:1911.05248, 2019.
- Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
- Soft masking for cost-constrained channel pruning. In European Conference on Computer Vision, pages 641–657. Springer, 2022.
- Optimal quantization using scaled codebook. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12095–12104, 2021.
- Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407, 2018.
- Fantastic generalization measures and where to find them. arXiv preprint arXiv:1912.02178, 2019.
- When do flat minima optimizers work? In Advances in Neural Information Processing Systems.
- On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836, 2016.
- Asam: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. In International Conference on Machine Learning, pages 5905–5914. PMLR, 2021.
- Optimal brain damage. NeurIPS, 1989.
- SNIP: Single-shot network pruning based on connection sensitivity. In International Conference on Learning Representations, 2019.
- Eagleeye: Fast sub-net evaluation for efficient neural network pruning. In European conference on computer vision, pages 639–654. Springer, 2020.
- Lost in pruning: The effects of pruning neural networks beyond test accuracy. Proceedings of Machine Learning and Systems, 3:93–138, 2021.
- Channel pruning via automatic structure search. arXiv preprint arXiv:2001.08565, 2020.
- Towards efficient and scalable sharpness-aware minimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12360–12370, 2022.
- Metapruning: Meta learning for automatic neural network channel pruning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3296–3305, 2019.
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Lana: latency aware network acceleration. In European Conference on Computer Vision, pages 137–156. Springer, 2022.
- Importance estimation for neural network pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11264–11272, 2019.
- Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440, 2016.
- Train flat, then compress: Sharpness-aware minimization learns more compressible models. arXiv preprint arXiv:2205.12694, 2022.
- Do imagenet classifiers generalize to imagenet? In International Conference on Machine Learning, pages 5389–5400. PMLR, 2019.
- How does batch normalization help optimization? Advances in neural information processing systems, 31, 2018.
- Hydra: Pruning adversarially robust neural networks. Advances in Neural Information Processing Systems, 33:19655–19666, 2020.
- When to prune? a policy towards early structural pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12247–12256, 2022.
- Structural pruning via latency-saliency knapsack. In Advances in Neural Information Processing Systems, 2022.
- Certified defenses for data poisoning attacks. Advances in neural information processing systems, 30, 2017.
- Relating adversarially robust generalization to flat minima. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7807–7817, 2021.
- Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Loss-based sensitivity regularization: towards deep sparse neural networks. NN, 2022.
- Picking winning tickets before training by preserving gradient flow. arXiv preprint arXiv:2002.07376, 2020.
- Neural pruning via growing regularization. arXiv preprint arXiv:2012.09243, 2020.
- Neural pruning via growing regularization. In International Conference on Learning Representations, 2021.
- Adversarial weight perturbation helps robust generalization. Advances in Neural Information Processing Systems, 33:2958–2969, 2020.
- Towards understanding generalization of deep learning: Perspective of loss landscapes. arXiv preprint arXiv:1706.10239, 2017.
- SegFormer: Simple and efficient design for semantic segmentation with transformers. In NeurIPS, 2021.
- NViT: Vision transformer compression and parameter redistribution. arXiv preprint arXiv:2110.04869, 2021.
- Netadapt: Platform-aware neural network adaptation for mobile applications. In Proceedings of the European Conference on Computer Vision (ECCV), pages 285–300, 2018.
- Dreaming to distill: Data-free knowledge transfer via deepinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8715–8724, 2020.
- Autoslim: Towards one-shot architecture search for channel numbers. arXiv preprint arXiv:1903.11728, 2019.
- Slimmable neural networks. In International Conference on Learning Representations, 2019.
- Hessian-aware pruning and optimal neural implant. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3880–3891, 2022.
- Understanding the robustness in vision transformers. In International Conference on Machine Learning, pages 27378–27394. PMLR, 2022.
- Neuron-level structured pruning using polarization regularizer. Advances in neural information processing systems, 33:9865–9877, 2020.