Robustifying DARTS by Eliminating Information Bypass Leakage via Explicit Sparse Regularization (2306.06858v1)
Abstract: Differentiable architecture search (DARTS) is a promising end to end NAS method which directly optimizes the architecture parameters through general gradient descent. However, DARTS is brittle to the catastrophic failure incurred by the skip connection in the search space. Recent studies also cast doubt on the basic underlying hypotheses of DARTS which are argued to be inherently prone to the performance discrepancy between the continuous-relaxed supernet in the training phase and the discretized finalnet in the evaluation phase. We figure out that the robustness problem and the skepticism can both be explained by the information bypass leakage during the training of the supernet. This naturally highlights the vital role of the sparsity of architecture parameters in the training phase which has not been well developed in the past. We thus propose a novel sparse-regularized approximation and an efficient mixed-sparsity training scheme to robustify DARTS by eliminating the information bypass leakage. We subsequently conduct extensive experiments on multiple search spaces to demonstrate the effectiveness of our method.
- H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture search,” in International Conference on Learning Representations, 2019.
- L. Li, M. Khodak, M.-F. Balcan, and A. Talwalkar, “Geometry-aware gradient algorithms for neural architecture search,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=MuSYkd1hxRP
- A. Pourchot, A. Ducarouge, and O. Sigaud, “To share or not to share: A comprehensive appraisal of weight-sharing,” arXiv preprint arXiv:2002.04289, 2020.
- Y. Zhang, Z. Lin, J. Jiang, Q. Zhang, Y. Wang, H. Xue, C. Zhang, and Y. Yang, “Deeper insights into weight sharing in neural architecture search,” arXiv preprint arXiv:2001.01431, 2020.
- K. Yu, C. Sciuto, M. Jaggi, C. Musat, and M. Salzmann, “Evaluating the search phase of neural architecture search,” in International Conference on Learning Representations, 2019.
- T. E. Arber Zela, T. Saikia, Y. Marrakchi, T. Brox, and F. Hutter, “Understanding and robustifying differentiable architecture search,” in International Conference on Learning Representations, vol. 3, 2020, p. 7.
- X. Dong and Y. Yang, “Nas-bench-201: Extending the scope of reproducible neural architecture search,” in International Conference on Learning Representations, 2020.
- H. Liang, S. Zhang, J. Sun, X. He, W. Huang, K. Zhuang, and Z. Li, “Darts+: Improved differentiable architecture search with early stopping,” arXiv preprint arXiv:1909.06035, 2019.
- R. Wang, M. Cheng, X. Chen, X. Tang, and C.-J. Hsieh, “Rethinking architecture selection in differentiable nas,” in International Conference on Learning Representations, 2021.
- X. Chu, T. Zhou, B. Zhang, and J. Li, “Fair darts: Eliminating unfair advantages in differentiable architecture search,” in European Conference on Computer Vision. Springer, 2020, pp. 465–480.
- X. Chen and C.-J. Hsieh, “Stabilizing differentiable architecture search via perturbation-based regularization,” in International Conference on Machine Learning. PMLR, 2020, pp. 1554–1565.
- B. Wu, X. Dai, P. Zhang, Y. Wang, F. Sun, Y. Wu, Y. Tian, P. Vajda, Y. Jia, and K. Keutzer, “Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 10 734–10 742.
- S. Xie, H. Zheng, C. Liu, and L. Lin, “Snas: stochastic neural architecture search,” in International Conference on Learning Representations, 2019.
- X. Dong and Y. Yang, “Searching for a robust neural architecture in four gpu hours,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1761–1770.
- X. Chen, R. Wang, M. Cheng, X. Tang, and C.-J. Hsieh, “Drnas: Dirichlet neural architecture search,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=9FWas6YbmB3
- A. Zela, J. Siems, and F. Hutter, “Nas-bench-1shot1: Benchmarking and dissecting one-shot neural architecture search,” in International Conference on Learning Representations, 2020.
- Y. Xu, L. Xie, X. Zhang, X. Chen, G.-J. Qi, Q. Tian, and H. Xiong, “Pc-darts: Partial channel connections for memory-efficient architecture search,” in International Conference on Learning Representations, 2020.
- H. Pham, M. Guan, B. Zoph, Q. Le, and J. Dean, “Efficient neural architecture search via parameters sharing,” in International Conference on Machine Learning. PMLR, 2018, pp. 4095–4104.
- A. Yang, P. M. Esperança, and F. M. Carlucci, “Nas evaluation is frustratingly hard,” in International Conference on Learning Representations, 2020.
- B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” arXiv preprint arXiv:1611.01578, 2016.
- B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8697–8710.
- E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, “Regularized evolution for image classifier architecture search,” in Proceedings of the aaai conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 4780–4789.
- A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan et al., “Searching for mobilenetv3,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1314–1324.
- M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International Conference on Machine Learning. PMLR, 2019, pp. 6105–6114.
- M. Tan, R. Pang, and Q. V. Le, “Efficientdet: Scalable and efficient object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10 781–10 790.
- T. Elsken, J. H. Metzen, and F. Hutter, “Neural architecture search: A survey,” Journal of Machine Learning Research, vol. 20, no. 55, pp. 1–21, 2019.
- M. Wistuba, A. Rawat, and T. Pedapati, “A survey on neural architecture search,” arXiv preprint arXiv:1905.01392, 2019.
- G. Bender, P.-J. Kindermans, B. Zoph, V. Vasudevan, and Q. Le, “Understanding and simplifying one-shot architecture search,” in International Conference on Machine Learning. PMLR, 2018, pp. 550–559.
- X. Dong and Y. Yang, “One-shot neural architecture search via self-evaluated template network,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3681–3690.
- C. Ying, A. Klein, E. Christiansen, E. Real, K. Murphy, and F. Hutter, “Nas-bench-101: Towards reproducible neural architecture search,” in International Conference on Machine Learning. PMLR, 2019, pp. 7105–7114.