Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FlatNAS: optimizing Flatness in Neural Architecture Search for Out-of-Distribution Robustness (2402.19102v1)

Published 29 Feb 2024 in cs.LG, cs.AI, and cs.CV

Abstract: Neural Architecture Search (NAS) paves the way for the automatic definition of Neural Network (NN) architectures, attracting increasing research attention and offering solutions in various scenarios. This study introduces a novel NAS solution, called Flat Neural Architecture Search (FlatNAS), which explores the interplay between a novel figure of merit based on robustness to weight perturbations and single NN optimization with Sharpness-Aware Minimization (SAM). FlatNAS is the first work in the literature to systematically explore flat regions in the loss landscape of NNs in a NAS procedure, while jointly optimizing their performance on in-distribution data, their out-of-distribution (OOD) robustness, and constraining the number of parameters in their architecture. Differently from current studies primarily concentrating on OOD algorithms, FlatNAS successfully evaluates the impact of NN architectures on OOD robustness, a crucial aspect in real-world applications of machine and deep learning. FlatNAS achieves a good trade-off between performance, OOD generalization, and the number of parameters, by using only in-distribution data in the NAS exploration. The OOD robustness of the NAS-designed models is evaluated by focusing on robustness to input data corruptions, using popular benchmark datasets in the literature.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–44, 05 2015.
  2. X. He, K. Zhao, and X. Chu, “Automl: A survey of the state-of-the-art,” Knowledge-Based Systems, vol. 212, p. 106622, 2021.
  3. P. Ren, Y. Xiao, X. Chang, P.-Y. Huang, Z. Li, X. Chen, and X. Wang, “A comprehensive survey of neural architecture search: Challenges and solutions,” arXiv preprint arXiv:2006.02903, 2020.
  4. H. Benmeziane, K. El Maghraoui, H. Ouarnoughi, S. Niar, M. Wistuba, and N. Wang, “Hardware-Aware Neural Architecture Search: Survey and Taxonomy,” in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence.   Montreal, Canada: International Joint Conferences on Artificial Intelligence Organization, Aug. 2021, pp. 4322–4329.
  5. M. Gambella, A. Falcetta, and M. Roveri, “CNAS: Constrained Neural Architecture Search,” in 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC).   Prague, Czech Republic: IEEE, Oct. 2022, pp. 2918–2923. [Online]. Available: https://ieeexplore.ieee.org/document/9945080/
  6. M. Roveri, “Is tiny deep learning the new deep learning?” in Computational Intelligence and Data Analytics, R. Buyya, S. M. Hernandez, R. M. R. Kovvur, and T. H. Sarma, Eds.   Singapore: Springer Nature Singapore, 2023, pp. 23–39.
  7. D. Hendrycks and T. Dietterich, “Benchmarking neural network robustness to common corruptions and perturbations,” in International Conference on Learning Representations, 2019.
  8. D. Hendrycks, A. Zou, M. Mazeika, L. Tang, D. Song, and J. Steinhardt, “Pixmix: Dreamlike pictures comprehensively improve safety measures,” in NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and Applications, 2021. [Online]. Available: https://openreview.net/forum?id=WeUg_KpkFtt
  9. K. Kirchheim, M. Filax, and F. Ortmeier, “PyTorch-OOD: A Library for Out-of-Distribution Detection based on PyTorch,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).   New Orleans, LA, USA: IEEE, Jun. 2022, pp. 4350–4359.
  10. P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving generalization,” in International Conference on Learning Representations, 2021.
  11. J. Kwon, J. Kim, H. Park, and I. K. Choi, “Asam: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks,” in Proceedings of the 38th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. Meila and T. Zhang, Eds., vol. 139.   PMLR, 18–24 Jul 2021, pp. 5905–5914.
  12. Y. Yue, J. Jiang, Z. Ye, N. Gao, Y. Liu, and K. Zhang, “Sharpness-aware minimization revisited: Weighted sharpness as a regularization term,” in Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ser. KDD ’23.   New York, NY, USA: Association for Computing Machinery, 2023, p. 3185–3194. [Online]. Available: https://doi.org/10.1145/3580305.3599501
  13. F. Pittorino, C. Lucibello, C. Feinauer, G. Perugini, C. Baldassi, E. Demyanenko, and R. Zecchina, “Entropic gradient descent algorithms and wide flat minima,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=xjXg0bnoDmS
  14. S. Hochreiter and J. Schmidhuber, “Flat minima,” Neural Computation, vol. 9, no. 1, pp. 1–42, 1997.
  15. B. L. Annesi, C. Lauditi, C. Lucibello, E. M. Malatesta, G. Perugini, F. Pittorino, and L. Saglietti, “Star-shaped space of solutions of the spherical negative perceptron,” Phys. Rev. Lett., vol. 131, p. 227301, Nov 2023. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevLett.131.227301
  16. C. Baldassi, F. Pittorino, and R. Zecchina, “Shaping the learning landscape in neural networks around wide flat minima,” Proceedings of the National Academy of Sciences, vol. 117, no. 1, pp. 161–170, 2020.
  17. Y. Jiang, B. Neyshabur, H. Mobahi, D. Krishnan, and S. Bengio, “Fantastic generalization measures and where to find them,” 2019.
  18. C. Lucibello, F. Pittorino, G. Perugini, and R. Zecchina, “Deep learning via message passing algorithms based on belief propagation,” Machine Learning: Science and Technology, vol. 3, no. 3, p. 035005, jul 2022. [Online]. Available: https://dx.doi.org/10.1088/2632-2153/ac7d3b
  19. F. Pittorino, A. Ferraro, G. Perugini, C. Feinauer, C. Baldassi, and R. Zecchina, “Deep networks on toroids: Removing symmetries reveals the structure of flat regions in the landscape geometry,” in Proceedings of the 39th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, Eds., vol. 162.   PMLR, 17–23 Jul 2022, pp. 17 759–17 781. [Online]. Available: https://proceedings.mlr.press/v162/pittorino22a.html
  20. D. Li, Z. Teng, Q. Li, and Z. Wang, “Sharpness-aware minimization for out-of-distribution generalization,” in Neural Information Processing, B. Luo, L. Cheng, Z.-G. Wu, H. Li, and C. Li, Eds.   Singapore: Springer Nature Singapore, 2024, pp. 555–567.
  21. J. Cha, S. Chun, K. Lee, H.-C. Cho, S. Park, Y. Lee, and S. Park, “Swad: Domain generalization by seeking flat minima,” in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34.   Curran Associates, Inc., 2021, pp. 22 405–22 418.
  22. T. Kim, S. Lim, and K. Song, “Sufficient invariant learning for distribution shift,” 2023.
  23. R. Pang, Z. Xi, S. Ji, X. Luo, and T. Wang, “On the Security Risks of AutoML,” Oct. 2021, arXiv:2110.06018 [cs].
  24. H. Bai, F. Zhou, L. Hong, N. Ye, S.-H. G. Chan, and Z. Li, “NAS-OoD: Neural Architecture Search for Out-of-Distribution Generalization,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV).   Montreal, QC, Canada: IEEE, Oct. 2021, pp. 8300–8309.
  25. C. Devaguptapu, D. Agarwal, G. Mittal, P. Gopalani, and V. N. Balasubramanian, “On Adversarial Robustness: A Neural Architecture Search perspective,” Aug. 2021, arXiv:2007.08428 [cs, stat].
  26. Y. Li, Z. Yang, Y. Wang, and C. Xu, “Neural Architecture Dilation for Adversarial Robustness,” Aug. 2021, arXiv:2108.06885 [cs].
  27. M. Dong, Y. Li, Y. Wang, and C. Xu, “Adversarially Robust Neural Architectures,” Feb. 2023, arXiv:2009.00902 [cs].
  28. X. Zhu, J. Li, Y. Liu, and W. Wang, “Robust Neural Architecture Search,” Apr. 2023, arXiv:2304.02845 [cs].
  29. M. Guo, Y. Yang, R. Xu, Z. Liu, and D. Lin, “When NAS Meets Robustness: In Search of Robust Architectures against Adversarial Attacks,” Mar. 2020, arXiv:1911.10695 [cs, stat].
  30. J. Mok, B. Na, H. Choe, and S. Yoon, “AdvRush: Searching for Adversarially Robust Neural Architectures,” Aug. 2021, arXiv:2108.01289 [cs].
  31. X. Wang, S. Cao, M. Li, and K. M. Kitani, “Neighborhood-Aware Neural Architecture Search,” Oct. 2021, arXiv:2105.06369 [cs].
  32. H. Liu, K. Simonyan, and Y. Yang, “DARTS: Differentiable Architecture Search,” Apr. 2019, arXiv:1806.09055 [cs, stat].
  33. H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han, “Once-for-All: Train One Network and Specialize it for Efficient Deployment,” Apr. 2020, arXiv:1908.09791 [cs, stat].
  34. M. Wistuba, A. Rawat, and T. Pedapati, “A Survey on Neural Architecture Search,” Jun. 2019, arXiv:1905.01392 [cs, stat].
  35. A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, Q. V. Le, and H. Adam, “Searching for MobileNetV3,” Nov. 2019, arXiv:1905.02244 [cs]. [Online]. Available: http://arxiv.org/abs/1905.02244
  36. Z. Lu, K. Deb, E. Goodman, W. Banzhaf, and V. N. Boddeti, “NSGANetV2: Evolutionary Multi-Objective Surrogate-Assisted Neural Architecture Search,” Jul. 2020, arXiv:2007.10396 [cs]. [Online]. Available: http://arxiv.org/abs/2007.10396
  37. M. S. Mahbub, “A comparative study on constraint handling techniques of nsgaii,” in 2020 International Conference on Electrical, Communication, and Computer Engineering (ICECCE), 2020, pp. 1–5.
  38. J. Zhuang, B. Gong, L. Yuan, Y. Cui, H. Adam, N. C. Dvornek, sekhar tatikonda, J. s Duncan, and T. Liu, “Surrogate gap minimization improves sharpness-aware training,” in International Conference on Learning Representations, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.