Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MGAS: Multi-Granularity Architecture Search for Trade-Off Between Model Effectiveness and Efficiency (2310.15074v3)

Published 23 Oct 2023 in cs.LG and cs.AI

Abstract: Neural architecture search (NAS) has gained significant traction in automating the design of neural networks. To reduce the time cost, differentiable architecture search (DAS) transforms the traditional paradigm of discrete candidate sampling and evaluation into that of differentiable super-net optimization and discretization. However, existing DAS methods fail to trade off between model performance and model size. They either only conduct coarse-grained operation-level search, which results in redundant model parameters, or restrictively explore fine-grained filter-level and weight-level units with pre-defined remaining ratios, suffering from excessive pruning problem. Additionally, these methods compromise search quality to save memory during the search process. To tackle these issues, we introduce multi-granularity architecture search (MGAS), a unified framework which aims to discover both effective and efficient neural networks by comprehensively yet memory-efficiently exploring the multi-granularity search space. Specifically, we improve the existing DAS methods in two aspects. First, we balance the model unit numbers at different granularity levels with adaptive pruning. We learn discretization functions specific to each granularity level to adaptively determine the unit remaining ratio according to the evolving architecture. Second, we reduce the memory consumption without degrading the search quality using multi-stage search. We break down the super-net optimization and discretization into multiple sub-net stages, and perform progressive re-evaluation to allow for re-pruning and regrowing of previous units during subsequent stages, compensating for potential bias. Extensive experiments on CIFAR-10, CIFAR-100 and ImageNet demonstrate that MGAS outperforms other state-of-the-art methods in achieving a better trade-off between model performance and model size.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture search,” arXiv preprint arXiv:1806.09055, 2018.
  2. M. Suganuma, M. Ozay, and T. Okatani, “Exploiting the potential of standard convolutional autoencoders for image restoration by evolutionary search,” in International Conference on Machine Learning.   PMLR, 2018, pp. 4771–4780.
  3. L.-C. Chen, M. Collins, Y. Zhu, G. Papandreou, B. Zoph, F. Schroff, H. Adam, and J. Shlens, “Searching for efficient multi-scale architectures for dense image prediction,” Advances in neural information processing systems, vol. 31, 2018.
  4. D. So, Q. Le, and C. Liang, “The evolved transformer,” in International Conference on Machine Learning.   PMLR, 2019, pp. 5877–5886.
  5. B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” arXiv preprint arXiv:1611.01578, 2016.
  6. B. Baker, O. Gupta, N. Naik, and R. Raskar, “Designing neural network architectures using reinforcement learning,” arXiv preprint arXiv:1611.02167, 2016.
  7. T. Elsken, J. H. Metzen, and F. Hutter, “Efficient multi-objective neural architecture search via lamarckian evolution,” arXiv preprint arXiv:1804.09081, 2018.
  8. T. Salimans, J. Ho, X. Chen, S. Sidor, and I. Sutskever, “Evolution strategies as a scalable alternative to reinforcement learning,” arXiv preprint arXiv:1703.03864, 2017.
  9. D. Stamoulis, R. Ding, D. Wang, D. Lymberopoulos, B. Priyantha, J. Liu, and D. Marculescu, “Single-path nas: Designing hardware-efficient convnets in less than 4 hours,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases.   Springer, 2019, pp. 481–497.
  10. Y. Fu, W. Chen, H. Wang, H. Li, Y. Lin, and Z. Wang, “Autogan-distiller: Searching to compress generative adversarial networks,” arXiv preprint arXiv:2006.08198, 2020.
  11. Y. Wu, Z. Huang, S. Kumar, R. S. Sukthanker, R. Timofte, and L. Van Gool, “Trilevel neural architecture search for efficient single image super-resolution,” arXiv preprint arXiv:2101.06658, 2021.
  12. H. You, B. Li, Z. Sun, X. Ouyang, and Y. Lin, “Supertickets: Drawing task-agnostic lottery tickets from supernets via jointly architecture searching and parameter pruning,” in European Conference on Computer Vision.   Springer, 2022, pp. 674–690.
  13. H. Mousavi, M. Loni, M. Alibeigi, and M. Daneshtalab, “Dass: Differentiable architecture search for sparse neural networks,” ACM Transactions on Embedded Computing Systems, vol. 22, no. 5s, pp. 1–21, 2023.
  14. T. Jin, M. Carbin, D. Roy, J. Frankle, and G. K. Dziugaite, “Pruning’s effect on generalization through the lens of training and regularization,” Advances in Neural Information Processing Systems, vol. 35, pp. 37 947–37 961, 2022.
  15. H. Yu, H. Peng, Y. Huang, J. Fu, H. Du, L. Wang, and H. Ling, “Cyclic differentiable architecture search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  16. F. Gao, B. Song, D. Wang, and H. Qin, “Mr-darts: Restricted connectivity differentiable architecture search in multi-path search space,” Neurocomputing, vol. 482, pp. 27–39, 2022.
  17. Y. Xu, L. Xie, X. Zhang, X. Chen, G.-J. Qi, Q. Tian, and H. Xiong, “Pc-darts: Partial channel connections for memory-efficient architecture search,” arXiv preprint arXiv:1907.05737, 2019.
  18. Y. Xue and J. Qin, “Partial connection based on channel attention for differentiable neural architecture search,” IEEE Transactions on Industrial Informatics, 2022.
  19. H. Cai, L. Zhu, and S. Han, “Proxylessnas: Direct neural architecture search on target task and hardware,” arXiv preprint arXiv:1812.00332, 2018.
  20. X. Dong and Y. Yang, “Searching for a robust neural architecture in four gpu hours,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1761–1770.
  21. H. Tan, S. Guo, Y. Zhong, and W. Huang, “Mutually-aware sub-graphs differentiable architecture search,” arXiv preprint arXiv:2107.04324, 2021.
  22. C. Li, J. Peng, L. Yuan, G. Wang, X. Liang, L. Lin, and X. Chang, “Block-wisely supervised neural architecture search with knowledge distillation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1989–1998.
  23. C. Li, T. Tang, G. Wang, J. Peng, B. Wang, X. Liang, and X. Chang, “Bossnas: Exploring hybrid cnn-transformers with block-wisely self-supervised neural architecture search,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 12 281–12 291.
  24. C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy, “Progressive neural architecture search,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 19–34.
  25. T. Elsken, J. H. Metzen, and F. Hutter, “Neural architecture search: A survey,” The Journal of Machine Learning Research, vol. 20, no. 1, pp. 1997–2017, 2019.
  26. K. Bi, C. Hu, L. Xie, X. Chen, L. Wei, and Q. Tian, “Stabilizing darts with amended gradient estimation on architectural parameters,” arXiv preprint arXiv:1910.11831, 2019.
  27. K. Bi, L. Xie, X. Chen, L. Wei, and Q. Tian, “Gold-nas: Gradual, one-level, differentiable,” arXiv preprint arXiv:2007.03331, 2020.
  28. X. Chen, L. Xie, J. Wu, and Q. Tian, “Progressive differentiable architecture search: Bridging the depth gap between search and evaluation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 1294–1303.
  29. T. Dettmers and L. Zettlemoyer, “Sparse networks from scratch: Faster training without losing performance,” arXiv preprint arXiv:1907.04840, 2019.
  30. X. Chen, R. Wang, M. Cheng, X. Tang, and C.-J. Hsieh, “Drnas: Dirichlet neural architecture search,” arXiv preprint arXiv:2006.10355, 2020.
  31. T. DeVries and G. W. Taylor, “Improved regularization of convolutional neural networks with cutout,” arXiv preprint arXiv:1708.04552, 2017.
  32. E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le, “Autoaugment: Learning augmentation policies from data,” arXiv preprint arXiv:1805.09501, 2018.
  33. S. Bianco, R. Cadene, L. Celona, and P. Napoletano, “Benchmark analysis of representative deep neural network architectures,” IEEE access, vol. 6, pp. 64 270–64 277, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xiaoyun Liu (3 papers)
  2. Divya Saxena (13 papers)
  3. Jiannong Cao (73 papers)
  4. Yuqing Zhao (5 papers)
  5. Penghui Ruan (6 papers)