Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch (2403.14729v1)

Published 21 Mar 2024 in cs.CV and cs.LG

Abstract: Current techniques for deep neural network (DNN) pruning often involve intricate multi-step processes that require domain-specific expertise, making their widespread adoption challenging. To address the limitation, the Only-Train-Once (OTO) and OTOv2 are proposed to eliminate the need for additional fine-tuning steps by directly training and compressing a general DNN from scratch. Nevertheless, the static design of optimizers (in OTO) can lead to convergence issues of local optima. In this paper, we proposed the Auto-Train-Once (ATO), an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs. During the model training phase, our approach not only trains the target model but also leverages a controller network as an architecture generator to guide the learning of target model weights. Furthermore, we developed a novel stochastic gradient algorithm that enhances the coordination between model training and controller network training, thereby improving pruning performance. We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures (including ResNet18, ResNet34, ResNet50, ResNet56, and MobileNetv2) on standard benchmark datasets (CIFAR-10, CIFAR-100, and ImageNet).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Fast oscar and owl regression via safe screening rules. In International conference on machine learning, pages 653–663. PMLR, 2020.
  2. An accelerated doubly stochastic gradient method with faster explicit model identification. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, pages 57–66, 2022a.
  3. Doubly sparse asynchronous learning for stochastic composite optimization. In Thirty-First International Joint Conference on Artificial Intelligence (IJCAI), pages 1916–1922, 2022b.
  4. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.
  5. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
  6. Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 535–541, 2006.
  7. Only train once: A one-shot neural network training and pruning framework. Advances in Neural Information Processing Systems, 34:19637–19651, 2021.
  8. Otov2: Automatic, generic, user-friendly. arXiv preprint arXiv:2303.06862, 2023.
  9. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014.
  10. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. Ieee, 2009.
  11. Depgraph: Towards any structural pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16091–16101, 2023.
  12. The state of sparsity in deep neural networks. arXiv preprint arXiv:1902.09574, 2019.
  13. Discrete model compression with resource constraint for deep neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1899–1908, 2020.
  14. Network pruning via performance maximization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9270–9280, 2021.
  15. Structural alignment for network pruning through partial regularization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17402–17412, 2023.
  16. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming, 155(1-2):267–305, 2016.
  17. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
  18. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  19. Soft filter pruning for accelerating deep convolutional neural networks. In International Joint Conference on Artificial Intelligence (IJCAI), pages 2234–2240, 2018a.
  20. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV), pages 784–800, 2018b.
  21. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4340–4349, 2019.
  22. Learning filter pruning criteria for deep convolutional neural networks acceleration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2009–2018, 2020.
  23. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  24. Efficient mirror descent ascent methods for nonsmooth minimax problems. Advances in Neural Information Processing Systems, 34:10431–10443, 2021.
  25. Data-driven sparse structure selection for deep neural networks. In Proceedings of the European conference on computer vision (ECCV), pages 304–320, 2018.
  26. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37, pages 448–456. JMLR.org, 2015.
  27. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
  28. Operation-aware soft channel pruning using differentiable masks. In International Conference on Machine Learning, pages 5122–5131. PMLR, 2020.
  29. Plug-in, trainable gate for streamlining arbitrary neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, 2020.
  30. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  31. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
  32. Deep learning. nature, 521(7553):436–444, 2015.
  33. Pruning filters for efficient convnets. ICLR, 2017.
  34. Group sparsity: The hinge between filter pruning and decomposition for network compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8018–8027, 2020a.
  35. Dhp: Differentiable meta pruning via hypernetworks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VIII 16, pages 608–624. Springer, 2020b.
  36. Towards compact cnns via collaborative compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6438–6447, 2021.
  37. Revisiting random channel pruning for neural network compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 191–201, 2022.
  38. Differentiable transportation pruning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16957–16967, 2023.
  39. Hrank: Filter pruning using high-rank feature map. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020a.
  40. Channel pruning via automatic structure search. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 673 – 679, 2020b.
  41. Learning efficient convolutional networks through network slimming. In ICCV, 2017.
  42. Metapruning: Meta learning for automatic neural network channel pruning. In Proceedings of the IEEE International Conference on Computer Vision, pages 3296–3305, 2019.
  43. Statistical evaluation of data requirement for ramp metering performance assessment. Transportation Research Part A: Policy and Practice, 141:248–261, 2020.
  44. Eliminating the impacts of traffic volume variation on before and after studies: a causal inference approach. Journal of Intelligent Transportation Systems, pages 1–15, 2023.
  45. Bayesian optimization through gaussian cox process models for spatio-temporal data. arXiv preprint arXiv:2401.14544, 2024a.
  46. Projection-optimal monotonic value function factorization in multi-agent reinforcement learning. In Proceedings of the 2024 International Conference on Autonomous Agents and Multiagent Systems, 2024b.
  47. Importance estimation for neural network pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 11264–11272, 2019.
  48. Collaborative channel pruning for deep networks. In International Conference on Machine Learning, pages 5113–5122, 2019.
  49. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
  50. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4510–4520, 2018.
  51. Zuowei Shen. Deep network approximation characterized by number of neurons. Communications in Computational Physics, 28(5):1768–1811, 2020.
  52. ScienceExamCER: A high-density fine-grained science-domain corpus for common entity recognition. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4529–4546, Marseille, France, 2020. European Language Resources Association.
  53. Scop: Scientific control for reliable neural network pruning. Advances in Neural Information Processing Systems, 33, 2020.
  54. Structured pruning of large language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6151–6162, Online, 2020. Association for Computational Linguistics.
  55. Learning structured sparsity in deep neural networks. In Advances in neural information processing systems, pages 2074–2082, 2016.
  56. Faster adaptive federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 10379–10387, 2023a.
  57. Leveraging foundation models to improve lightweight clients in federated learning. arXiv preprint arXiv:2311.08479, 2023b.
  58. Solving a class of non-convex minimax optimization in federated learning. Advances in Neural Information Processing Systems, 36, 2024.
  59. A generate-and-rank framework with semantic type regularization for biomedical concept normalization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8452–8464, Online, 2020. Association for Computational Linguistics.
  60. Good subnetworks provably exist: Pruning via greedy forward selection. In International Conference on Machine Learning, pages 10820–10830. PMLR, 2020.
  61. Gate decorator: Global filter pruning method for accelerating deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 2130–2141, 2019.
  62. Topology-aware network pruning using multi-stage graph embedding and reinforcement learning. In International Conference on Machine Learning, pages 25656–25667. PMLR, 2022.
  63. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(1):49–67, 2006.
  64. Improving toponym resolution with better candidate generation, transformer-based reranking, and two-stage resolution. In Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023), pages 48–60, Toronto, Canada, 2023. Association for Computational Linguistics.
  65. Joint models for answer verification in question answering systems. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3252–3262, Online, 2021. Association for Computational Linguistics.
  66. Wdrass: A web-scale dataset for document retrieval and answer sentence selection. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, page 4707–4711, New York, NY, USA, 2022a. Association for Computing Machinery.
  67. In situ answer sentence selection at web-scale. arXiv preprint arXiv:2201.05984, 2022b.
  68. Double retrieval and ranking for accurate question answering. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1751–1762, Dubrovnik, Croatia, 2023. Association for Computational Linguistics.
  69. Every parameter matters: Ensuring the convergence of federated learning with dynamic heterogeneous models reduction. Advances in Neural Information Processing Systems, 36, 2024.
  70. Discrimination-aware channel pruning for deep neural networks. In Advances in Neural Information Processing Systems, pages 875–886, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Xidong Wu (13 papers)
  2. Shangqian Gao (24 papers)
  3. Zeyu Zhang (143 papers)
  4. Zhenzhen Li (26 papers)
  5. Runxue Bao (18 papers)
  6. Yanfu Zhang (15 papers)
  7. Xiaoqian Wang (34 papers)
  8. Heng Huang (189 papers)
Citations (6)