Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge (2401.12350v1)

Published 22 Jan 2024 in cs.CV and cs.LG

Abstract: Neural Architecture Search (NAS) has become the de-facto approach for designing accurate and efficient networks for edge devices. Since models are typically quantized for edge deployment, recent work has investigated quantization-aware NAS (QA-NAS) to search for highly accurate and efficient quantized models. However, existing QA-NAS approaches, particularly few-bit mixed-precision (FB-MP) methods, do not scale to larger tasks. Consequently, QA-NAS has mostly been limited to low-scale tasks and tiny networks. In this work, we present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale tasks by leveraging the block-wise formulation introduced by block-wise NAS. We demonstrate strong results for the semantic segmentation task on the Cityscapes dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than DeepLabV3 (INT8) without compromising task performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Batchquant: Quantized-for-all architecture search with robust quantizer. Advances in Neural Information Processing Systems 34 (2021), 1074–1085.
  2. Once-for-all: Train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791 (2019).
  3. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).
  4. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. In Proceedings of the IEEE/CVF International Conference on computer vision. 12239–12248.
  5. Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors. Nature Machine Intelligence 3, 8 (jun 2021), 675–686. https://doi.org/10.1038/s42256-021-00356-5
  6. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3213–3223.
  7. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. 248–255.
  8. Neural architecture search: A survey. The Journal of Machine Learning Research 20, 1 (2019), 1997–2017.
  9. FNA++: Fast network adaptation via parameter remapping and architecture search. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 9 (2020), 2990–3004.
  10. Knowledge distillation: A survey. International Journal of Computer Vision 129 (2021), 1789–1819.
  11. Single path one-shot neural architecture search with uniform sampling. In Computer Vision–ECCV 2020. 544–560.
  12. Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342 (2018).
  13. Fahad Lateef and Yassine Ruichek. 2019. Survey on semantic segmentation using deep learning techniques. Neurocomputing 338 (2019), 321–348.
  14. Block-wisely supervised neural architecture search with knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1989–1998.
  15. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018).
  16. Deep learning for generic object detection: A survey. International journal of computer vision 128, 2 (2020), 261–318.
  17. Distilling optimal neural networks: Rapid search in diverse spaces. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12229–12238.
  18. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.
  19. Radar-based Object Classification in ADAS with Hardware-Aware NAS and Input Region Scaling. In 2023 IEEE Radar Conference (RadarConf23). IEEE, 1–6.
  20. Raghubir Singh and Sukhpal Singh Gill. 2023. Edge AI: a survey. Internet of Things and Cyber-Physical Systems (2023).
  21. Neural architecture search for energy-efficient always-on audio machine learning. Neural Computing and Applications 35, 16 (2023), 12133–12144.
  22. Quantization-Aware Neural Architecture Search with Hyperparameter Optimization for Industrial Predictive Maintenance Applications. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–2.
  23. BOMP-NAS: Bayesian Optimization Mixed Precision NAS. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–2.
  24. Apq: Joint search for network architecture, pruning and quantization policy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2078–2087.
  25. Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 129–137.
  26. Weight-sharing neural architecture search: A battle to shrink the optimization gap. ACM Computing Surveys (CSUR) 54, 9 (2021), 1–37.
Citations (1)

Summary

We haven't generated a summary for this paper yet.