Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DiffusionNAG: Predictor-guided Neural Architecture Generation with Diffusion Models (2305.16943v4)

Published 26 May 2023 in cs.LG

Abstract: Existing NAS methods suffer from either an excessive amount of time for repetitive sampling and training of many task-irrelevant architectures. To tackle such limitations of existing NAS methods, we propose a paradigm shift from NAS to a novel conditional Neural Architecture Generation (NAG) framework based on diffusion models, dubbed DiffusionNAG. Specifically, we consider the neural architectures as directed graphs and propose a graph diffusion model for generating them. Moreover, with the guidance of parameterized predictors, DiffusionNAG can flexibly generate task-optimal architectures with the desired properties for diverse tasks, by sampling from a region that is more likely to satisfy the properties. This conditional NAG scheme is significantly more efficient than previous NAS schemes which sample the architectures and filter them using the property predictors. We validate the effectiveness of DiffusionNAG through extensive experiments in two predictor-based NAS scenarios: Transferable NAS and Bayesian Optimization (BO)-based NAS. DiffusionNAG achieves superior performance with speedups of up to 35 times when compared to the baselines on Transferable NAS benchmarks. Furthermore, when integrated into a BO-based algorithm, DiffusionNAG outperforms existing BO-based NAS approaches, particularly in the large MobileNetV3 search space on the ImageNet 1K dataset. Code is available at https://github.com/CownowAn/DiffusionNAG.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (80)
  1. Brian D.O. Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
  2. Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13(1):281–305, 2012.
  3. ProxylessNAS: Direct neural architecture search on target task and hardware. In International Conference on Learning Representations, 2019.
  4. Once-for-all: Train one network and specialize it for efficient deployment. In International Conference on Learning Representations, 2020.
  5. Maskgit: Masked generative image transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  11315–11325, 2022.
  6. DrNAS: Dirichlet neural architecture search. In International Conference on Learning Representations, 2021.
  7. HEBO: Pushing the limits of sample-efficient hyperparameter optimisation. Journal of Artificial Intelligence Research, 74, 07 2022.
  8. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International conference on machine learning, pp.  2206–2216. PMLR, 2020.
  9. Autoaugment: Learning augmentation policies from data. CVPR, 2018.
  10. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
  11. BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio (eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp.  4171–4186. Association for Computational Linguistics, 2019.
  12. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017.
  13. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  14. Xuanyi Dong and Yi Yang. One-shot neural architecture search via self-evaluated template network. In Proceedings of the IEEE International Conference on Computer Vision (CVPR), 2019a.
  15. Xuanyi Dong and Yi Yang. Searching for a robust neural architecture in four gpu hours. In Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR), 2019b.
  16. Xuanyi Dong and Yi Yang. Nas-bench-201: Extending the scope of reproducible neural architecture search. In International Conference on Learning Representations (ICLR), 2020a.
  17. Xuanyi Dong and Yi Yang. Nas-bench-201: Extending the scope of reproducible neural architecture search. In International Conference on Learning Representations (ICLR), 2020b.
  18. Brp-nas: Prediction-based nas using gcns. Advances in neural information processing systems (NeurIPS), 2020.
  19. BOHB: Robust and efficient hyperparameter optimization at scale. In International Conference on Machine Learning, pp.  1437–1446. PMLR, 2018.
  20. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  21. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  22. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  23. Equivariant diffusion for molecule generation in 3d. In International Conference on Machine Learning, pp.  8867–8887. PMLR, 2022.
  24. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  7132–7141, 2018.
  25. Deep networks with stochastic depth. In ECCV, 2016.
  26. Aapo Hyvärinen. Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res., 6:695–709, 2005.
  27. Diff-tts: A denoising diffusion model for text-to-speech. In Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021, pp.  3605–3609. ISCA, 2021.
  28. Score-based generative modeling of graphs via the system of stochastic differential equations. In Proceedings of the 39th International Conference on Machine Learning, 2022.
  29. Neural architecture design and robustness: A dataset. In The Eleventh International Conference on Learning Representations, 2023.
  30. Diffwave: A versatile diffusion model for audio synthesis. In ICLR, 2021.
  31. Alex Krizhevsky. Learning multiple layers of features from tiny images, 2009.
  32. Rapid neural architecture search by learning to generate graphs from datasets. In International Conference on Learning Representations, 2021a.
  33. Hardware-adaptive efficient latency prediction for nas via meta-learning. Advances in Neural Information Processing Systems, 34:27016–27028, 2021b.
  34. Set transformer: A framework for attention-based permutation-invariant neural networks. In International Conference on Machine Learning (ICML), 2019.
  35. Exploring chemical space with score-based out-of-distribution generation. Proceedings of the 40th International Conference on Machine Learning, 2023.
  36. Hw-nas-bench: Hardware-aware neural architecture search benchmark. In International Conference on Learning Representations, 2021.
  37. Random search and reproducibility for neural architecture search. In Uncertainty in Artificial Intelligence, pp.  367–377. PMLR, 2019.
  38. Best practices for scientific research on neural architecture search. The Journal of Machine Learning Research, 21(1):9820–9837, 2020.
  39. Darts: Differentiable architecture search. In In International Conference on Learning Representations (ICLR), 2019.
  40. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
  41. NSGANetV2: Evolutionary multi-objective surrogate-assisted neural architecture search. In European Conference on Computer Vision (ECCV), 2020.
  42. Neural architecture optimization. In Advances in neural information processing systems (NeurIPS), 2018.
  43. Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151, 2013.
  44. Diversity and diffusion: Observations on synthetic image distributions with stable diffusion. arXiv preprint arXiv:2311.00056, 2023.
  45. Probo: Versatile bayesian optimization using any probabilistic programming language. arXiv preprint arXiv:1901.11515, 2019.
  46. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pp.  16784–16804. PMLR, 2022.
  47. A generic graph-based neural architecture encoding scheme for predictor-based nas. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII, pp.  189–204. Springer, 2020.
  48. Evaluating efficient performance estimators of neural architectures. Advances in Neural Information Processing Systems, 34:12265–12277, 2021.
  49. Permutation invariant graph generation via score-based generative modeling. In International Conference on Artificial Intelligence and Statistics, pp.  4474–4484. PMLR, 2020a.
  50. Permutation invariant graph generation via score-based generative modeling. In AISTATS, 2020b.
  51. Cats and dogs. In IEEE Conference on Computer Vision and Pattern Recognition, 2012.
  52. Efficient neural architecture search via parameter sharing. In International Conference on Machine Learning (ICML), 2018.
  53. Zero-shot text-to-image generation. In International Conference on Machine Learning. PMLR, 2021.
  54. Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence (AAAI), 2019.
  55. High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pp.  10674–10685. IEEE, 2022.
  56. Interpretable neural architecture search via bayesian optimisation with weisfeiler-lehman kernels. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=j9Rv7qdXjd.
  57. Transfer NAS with meta-learned bayesian surrogates. In The Eleventh International Conference on Learning Representations, 2023.
  58. Practical bayesian optimization of machine learning algorithms. arXiv preprint arXiv:1206.2944, 2012.
  59. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp.  2256–2265. PMLR, 2015.
  60. Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32, 2019.
  61. Score-based generative modeling through stochastic differential equations. In ICLR, 2021a.
  62. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021b. URL https://openreview.net/forum?id=PxTIG12RRHS.
  63. Applied Stochastic Differential Equations. Institute of Mathematical Statistics Textbooks. Cambridge University Press, 2019.
  64. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  2820–2828, 2019.
  65. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  66. Digress: Discrete denoising diffusion for graph generation. arXiv preprint arXiv:2209.14734, 2022.
  67. Hat: Hardware-aware transformers for efficient natural language processing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020.
  68. Neural predictor for neural architecture search. In European Conference on Computer Vision, pp.  660–676. Springer, 2020.
  69. Local search is state of the art for nas benchmarks. arXiv preprint arXiv:2005.02960, pp.  76, 2020.
  70. Bananas: Bayesian optimization with neural architectures for neural architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, 2021a.
  71. How powerful are performance predictors in neural architecture search? Advances in Neural Information Processing Systems, 34:28454–28469, 2021b.
  72. Neural architecture search: Insights from 1000 papers. arXiv preprint arXiv:2301.08727, 2023.
  73. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
  74. Pc-darts: Partial channel connections for memory-efficient architecture search. In International Conference on Learning Representations (ICLR), 2020.
  75. Cate: Computation-aware neural architecture encoding with transformers. In ICML, 2021a.
  76. Nas-bench-x11 and the power of learning curves. Advances in Neural Information Processing Systems, 2021b.
  77. Surrogate nas benchmarks: Going beyond the limited search spaces of tabular nas benchmarks. In Tenth International Conference on Learning Representations, 2022.
  78. D-VAE: A variational autoencoder for directed acyclic graphs. Advances in Neural Information Processing Systems, 32, 2019.
  79. Neural architecture search with reinforcement learning. International Conference on Learning Representations (ICLR), 2017.
  80. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sohyun An (5 papers)
  2. Hayeon Lee (14 papers)
  3. Jaehyeong Jo (14 papers)
  4. Seanie Lee (28 papers)
  5. Sung Ju Hwang (178 papers)
Citations (7)

Summary

Overview of DiffusionNAG: Predictor-Guided Neural Architecture Generation with Diffusion Models

The paper "DiffusionNAG: Predictor-guided Neural Architecture Generation with Diffusion Models" presents an innovative framework for Neural Architecture Generation (NAG) utilizing diffusion models, termed DiffusionNAG. The primary objective of DiffusionNAG is to address the limitations of existing Neural Architecture Search (NAS) methodologies, which often suffer from high computational costs due to repetitive sampling and full training of numerous task-irrelevant architectures. DiffusionNAG introduces a paradigm shift from traditional NAS to a more efficient NAG process guided by predictors.

Key Contributions

  1. Conditional Neural Architecture Generation (NAG): The authors propose a conditional NAG framework that employs diffusion models for generating neural architectures as directed graphs. By incorporating parameterized predictors, the model flexibly generates task-optimal architectures with desired properties, significantly improving efficiency over traditional NAS methods.
  2. Predictor-Guided Diffusion Model: The approach utilizes diffusion generative models, which gradually inject noise into data and learn to reverse this process, proven effective in various domains. The model integrates parameterized predictors to guide the generation of architectures that satisfy specific objectives such as accuracy or robustness.
  3. Score Network for Valid Architecture Generation: The paper introduces a novel score network for neural architectures to ensure valid generation by capturing the computational flow and positional information in directed acyclic graphs.
  4. Applications and Performance: DiffusionNAG demonstrates superior performance in two scenarios: Transferable NAS and Bayesian Optimization-based NAS. It achieves a significant speedup, by up to 35 times, compared to baseline methods, especially in the MobileNetV3 search space on the ImageNet 1K dataset.

Methodology

The diffusion process is employed to model the forward perturbation of architecture distribution towards a known prior distribution, which is then reversed to sample architectures. The framework employs a VE SDE forward process and a novel score network to capture the directed nature of neural architectures.

Implications

  • Efficiency in Search Space Exploration: DiffusionNAG's capability to generate architectures aligned closely with specified distributions leads to reduced computational overhead, making it highly efficient.
  • Versatility Across Tasks: The plug-and-play nature of the predictors allows for adapting the generative model to various NAS tasks without retraining, demonstrating versatility in diverse scenarios such as latency or robustness-constrained NAS.
  • Potential for Enhancements: The framework sets a foundation for future enhancement of diffusion-based models in NAS, suggesting further explorations into adaptive and context-aware architecture generation.

Future Directions

The paper opens multiple avenues for exploration, including:

  • Better Exploration of Large Search Spaces: With its efficiency, DiffusionNAG can be extended and refined for other expansive search spaces in varying domains.
  • Integration with More Complex Predictors: Future work might explore integration with more sophisticated predictors that consider additional task constraints.
  • Enhanced Score Network Architecture: Further refinements in the score network could improve the fidelity of the architecture generation process even more.

In conclusion, DiffusionNAG offers a robust and efficient alternative to traditional NAS, indicating a significant step forward in the automation of neural architecture design. The proposed framework effectively balances performance and efficiency, promising contributions to several practical applications in neural architecture development.