Dynamical Isometry based Rigorous Fair Neural Architecture Search (2307.02263v2)
Abstract: Recently, the weight-sharing technique has significantly speeded up the training and evaluation procedure of neural architecture search. However, most existing weight-sharing strategies are solely based on experience or observation, which makes the searching results lack interpretability and rationality. In addition, due to the negligence of fairness, current methods are prone to make misjudgments in module evaluation. To address these problems, we propose a novel neural architecture search algorithm based on dynamical isometry. We use the fix point analysis method in the mean field theory to analyze the dynamics behavior in the steady state random neural network, and how dynamic isometry guarantees the fairness of weight-sharing based NAS. Meanwhile, we prove that our module selection strategy is rigorous fair by estimating the generalization error of all modules with well-conditioned Jacobian. Extensive experiments show that, with the same size, the architecture searched by the proposed method can achieve state-of-the-art top-1 validation accuracy on ImageNet classification. In addition, we demonstrate that our method is able to achieve better and more stable training performance without loss of generality.
- Understanding and simplifying one-shot architecture search. In International conference on machine learning, pages 550–559. PMLR, 2018.
- Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332, 2018.
- Bn-nas: Neural architecture search with batch normalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 307–316, 2021.
- François Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1251–1258, 2017.
- Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12239–12248, 2021.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Latency-aware neural architecture search with multi-objective bayesian optimization. arXiv preprint arXiv:2106.11890, 2021.
- Training batchnorm and only batchnorm: On the expressive power of random features in cnns. arXiv preprint arXiv:2003.00152, 2020.
- Single path one-shot neural architecture search with uniform sampling. In European conference on computer vision, pages 544–560. Springer, 2020.
- Gdarts: A gpu-based runtime system for dataflow task programming on dependency applications. In 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), pages 547–552. IEEE, 2019.
- Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018.
- Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European conference on computer vision (ECCV), pages 116–131, 2018.
- Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.
- Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice. Advances in neural information processing systems, 30, 2017.
- Exponential expressivity in deep neural networks through transient chaos. Advances in neural information processing systems, 29, 2016.
- Large-scale evolution of image classifiers. In International Conference on Machine Learning, pages 2902–2911. PMLR, 2017.
- Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520, 2018.
- Gradient descent effects on differential neural architecture search: A survey. IEEE Access, 9:89602–89618, 2021.
- Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120, 2013.
- Deep information propagation. arXiv preprint arXiv:1611.01232, 2016.
- A correspondence between random neural networks and statistical field theory. arXiv preprint arXiv:1710.06570, 2017.
- Roland Speicher. Multiplicative functions on the lattice of non-crossing partitions and free convolution. Mathematische Annalen, 298(1):611–628, 1994.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
- Dynamical isometry and a mean field theory of cnns: How to train 10,000-layer vanilla convolutional neural networks. In International Conference on Machine Learning, pages 5393–5402. PMLR, 2018.
- Snas: stochastic neural architecture search. arXiv preprint arXiv:1812.09926, 2018.
- Pc-darts: Partial channel connections for memory-efficient architecture search. arXiv preprint arXiv:1907.05737, 2019.
- Few-shot neural architecture search. In International Conference on Machine Learning, pages 12707–12718. PMLR, 2021.
- Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
- Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8697–8710, 2018.