Papers
Topics
Authors
Recent
Search
2000 character limit reached

AZ-NAS: Assembling Zero-Cost Proxies for Network Architecture Search

Published 28 Mar 2024 in cs.CV and cs.LG | (2403.19232v1)

Abstract: Training-free network architecture search (NAS) aims to discover high-performing networks with zero-cost proxies, capturing network characteristics related to the final performance. However, network rankings estimated by previous training-free NAS methods have shown weak correlations with the performance. To address this issue, we propose AZ-NAS, a novel approach that leverages the ensemble of various zero-cost proxies to enhance the correlation between a predicted ranking of networks and the ground truth substantially in terms of the performance. To achieve this, we introduce four novel zero-cost proxies that are complementary to each other, analyzing distinct traits of architectures in the views of expressivity, progressivity, trainability, and complexity. The proxy scores can be obtained simultaneously within a single forward and backward pass, making an overall NAS process highly efficient. In order to integrate the rankings predicted by our proxies effectively, we introduce a non-linear ranking aggregation method that highlights the networks highly-ranked consistently across all the proxies. Experimental results conclusively demonstrate the efficacy and efficiency of AZ-NAS, outperforming state-of-the-art methods on standard benchmarks, all while maintaining a reasonable runtime cost.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. Zero-cost proxies for lightweight NAS. In Int. Conf. Learn. Represent., 2021.
  2. Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. Journal of the ACM (JACM), 58(2):1–34, 2011.
  3. Designing neural network architectures using reinforcement learning. In Int. Conf. Learn. Represent., 2017.
  4. Accelerating neural architecture search using performance prediction. In Int. Conf. Learn. Represent. Workshop, 2018.
  5. Understanding and simplifying one-shot architecture search. In Int. Conf. Mach. Learn., pages 550–559, 2018.
  6. ProxylessNAS: Direct neural architecture search on target task and hardware. In Int. Conf. Learn. Represent., 2019.
  7. Once-for-All: Train one network and specialize it for efficient deployment. In Int. Conf. Learn. Represent., 2020.
  8. Isotropy in the contextual embedding space: Clusters and manifolds. In Int. Conf. Learn. Represent., 2021.
  9. BN-NAS: Neural architecture search with batch normalization. In Int. Conf. Comput. Vis., pages 307–316, 2021a.
  10. AutoFormer: Searching transformers for visual recognition. In Int. Conf. Comput. Vis., pages 12270–12280, 2021b.
  11. Neural architecture search on ImageNet in four GPU hours: A theoretically inspired perspective. In Int. Conf. Learn. Represent., 2021c.
  12. A downsampled variant of ImageNet as an alternative to the CIFAR datasets. arXiv preprint arXiv:1707.08819, 2017.
  13. FairNAS: Rethinking evaluation fairness of weight sharing neural architecture search. In Int. Conf. Comput. Vis., pages 12239–12248, 2021.
  14. Reducing overfitting in deep networks by decorrelating representations. In Int. Conf. Learn. Represent., 2016.
  15. AutoAugment: Learning augmentation strategies from data. In IEEE Conf. Comput. Vis. Pattern Recog., pages 113–123, 2019.
  16. ImageNet: A large-scale hierarchical image database. In IEEE Conf. Comput. Vis. Pattern Recog., pages 248–255, 2009.
  17. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Int. Joint Conf. on Artificial Intell., 2015.
  18. NAS-Bench-201: Extending the scope of reproducible neural architecture search. In Int. Conf. Learn. Represent., 2020.
  19. HAWQ-V2: Hessian aware trace-weighted quantization of neural networks. In Adv. Neural Inform. Process. Syst., pages 18518–18529, 2020.
  20. An image is worth 16x16 words: Transformers for image recognition at scale. In Int. Conf. Learn. Represent., 2021.
  21. Understanding the difficulty of training deep feedforward neural networks. In AISTATS, pages 249–256, 2010.
  22. Learning both weights and connections for efficient neural network. In Adv. Neural Inform. Process. Syst., 2015.
  23. Complexity of linear regions in deep networks. In Int. Conf. Mach. Learn., pages 2596–2604, 2019a.
  24. Deep ReLU networks have surprisingly few activation patterns. In Adv. Neural Inform. Process. Syst., 2019b.
  25. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Int. Conf. Comput. Vis., pages 1026–1034, 2015.
  26. Deep residual learning for image recognition. In IEEE Conf. Comput. Vis. Pattern Recog., pages 770–778, 2016.
  27. Distilling the knowledge in a neural network. In Adv. Neural Inform. Process. Syst. Workshop, 2015.
  28. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
  29. Generalizing few-shot NAS with gradient matching. In Int. Conf. Learn. Represent., 2022.
  30. On feature decorrelation in self-supervised learning. In Int. Conf. Comput. Vis., pages 9598–9608, 2021.
  31. Decorrelated batch normalization. In IEEE Conf. Comput. Vis. Pattern Recog., pages 791–800, 2018.
  32. sharpDARTS: Faster and more accurate differentiable architecture search. arXiv preprint arXiv:1903.09900, 2019.
  33. Neural tangent kernel: Convergence and generalization in neural networks. In Adv. Neural Inform. Process. Syst., 2018.
  34. On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. In Adv. Neural Inform. Process. Syst., 2008.
  35. Learning multiple layers of features from tiny images. Technical report, 2009.
  36. Wide neural networks of any depth evolve as linear models under gradient descent. In Adv. Neural Inform. Process. Syst., 2019a.
  37. Network quantization with element-wise gradient scaling. In IEEE Conf. Comput. Vis. Pattern Recog., pages 6448–6457, 2021.
  38. SNIP: Single-shot network pruning based on connection sensitivity. In Int. Conf. Learn. Represent., 2019b.
  39. ZiCo: Zero-shot NAS via inverse coefficient of variation on gradients. In Int. Conf. Learn. Represent., 2023.
  40. Zen-NAS: A zero-shot nas for high-performance image recognition. In Int. Conf. Comput. Vis., pages 347–356, 2021.
  41. Progressive neural architecture search. In Eur. Conf. Comput. Vis., pages 19–34, 2018.
  42. DARTS: Differentiable architecture search. In Int. Conf. Learn. Represent., 2019.
  43. Swin transformer: Hierarchical vision transformer using shifted windows. In Int. Conf. Comput. Vis., pages 10012–10022, 2021.
  44. Dying ReLU and initialization: Theory and numerical examples. arXiv preprint arXiv:1903.06733, 2019.
  45. Neural architecture search without training. In Int. Conf. Mach. Learn., pages 7588–7598, 2021.
  46. Spectral normalization for generative adversarial networks. In Int. Conf. Learn. Represent., 2018.
  47. Distilling optimal neural networks: Rapid search in diverse spaces. In Int. Conf. Comput. Vis., pages 12229–12238, 2021.
  48. Evaluating efficient performance estimators of neural architectures. In Adv. Neural Inform. Process. Syst., pages 12265–12277, 2021.
  49. Fast finite width neural tangent kernel. In Int. Conf. Mach. Learn., pages 17018–17044, 2022.
  50. PyTorch: An imperative style, high-performance deep learning library. In Adv. Neural Inform. Process. Syst., 2019.
  51. Efficient neural architecture search via parameters sharing. In Int. Conf. Mach. Learn., pages 4095–4104, 2018.
  52. LipsFormer: Introducing lipschitz continuity to vision transformers. In Int. Conf. Learn. Represent., 2023.
  53. On network design spaces for visual recognition. In Int. Conf. Comput. Vis., pages 1882–1890, 2019.
  54. Regularized evolution for image classifier architecture search. In AAAI, pages 4780–4789, 2019.
  55. MobileNetV2: Inverted residuals and linear bottlenecks. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4510–4520, 2018.
  56. DeepMAD: Mathematical architecture design for deep convolutional neural network. In IEEE Conf. Comput. Vis. Pattern Recog., pages 6163–6173, 2023.
  57. NASI: Label-and data-agnostic neural architecture search at initialization. In Int. Conf. Learn. Represent., 2022a.
  58. Unifying and boosting gradient-based training-free neural architecture search. In Adv. Neural Inform. Process. Syst., pages 33001–33015, 2022b.
  59. Unleashing the power of gradient signal-to-noise ratio for zero-shot NAS. In IEEE Conf. Comput. Vis. Pattern Recog., pages 5763–5773, 2023.
  60. EfficientNet: Rethinking model scaling for convolutional neural networks. In Int. Conf. Mach. Learn., pages 6105–6114, 2019.
  61. Pruning neural networks without any data by iteratively conserving synaptic flow. In Adv. Neural Inform. Process. Syst., pages 6377–6389, 2020.
  62. Attention is all you need. In Adv. Neural Inform. Process. Syst., 2017.
  63. Picking winning tickets before training by preserving gradient flow. In Int. Conf. Learn. Represent., 2020.
  64. On the number of linear regions of convolutional neural networks. In Int. Conf. Mach. Learn., pages 10514–10523, 2020.
  65. CARS: Continuous evolution for efficient neural architecture search. In IEEE Conf. Comput. Vis. Pattern Recog., pages 1829–1838, 2020.
  66. PyHessian: Neural networks through the lens of the Hessian. In Int. Conf. Learn. Represent. Workshop, 2020.
  67. NAS-Bench-101: Towards reproducible neural architecture search. In Int. Conf. Mach. Learn., pages 7105–7114, 2019.
  68. mixup: Beyond empirical risk minimization. In Int. Conf. Learn. Represent., 2018.
  69. Overcoming multi-model forgetting in one-shot NAS with diversity maximization. In IEEE Conf. Comput. Vis. Pattern Recog., pages 7809–7818, 2020.
  70. Neural architecture search with random labels. In IEEE Conf. Comput. Vis. Pattern Recog., pages 10907–10916, 2021.
  71. GradSign: Model performance inference with theoretical insights. In Int. Conf. Learn. Represent., 2022.
  72. Few-shot neural architecture search. In Int. Conf. Mach. Learn., pages 12707–12718, 2021.
  73. Random erasing data augmentation. In AAAI, pages 13001–13008, 2020.
  74. Training-free transformer architecture search. In IEEE Conf. Comput. Vis. Pattern Recog., pages 10894–10903, 2022.
  75. Neural architecture search with reinforcement learning. In Int. Conf. Learn. Represent., 2017.
  76. Learning transferable architectures for scalable image recognition. In IEEE Conf. Comput. Vis. Pattern Recog., pages 8697–8710, 2018.
Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.