Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge (2401.12350v1)
Abstract: Neural Architecture Search (NAS) has become the de-facto approach for designing accurate and efficient networks for edge devices. Since models are typically quantized for edge deployment, recent work has investigated quantization-aware NAS (QA-NAS) to search for highly accurate and efficient quantized models. However, existing QA-NAS approaches, particularly few-bit mixed-precision (FB-MP) methods, do not scale to larger tasks. Consequently, QA-NAS has mostly been limited to low-scale tasks and tiny networks. In this work, we present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale tasks by leveraging the block-wise formulation introduced by block-wise NAS. We demonstrate strong results for the semantic segmentation task on the Cityscapes dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than DeepLabV3 (INT8) without compromising task performance.
- Batchquant: Quantized-for-all architecture search with robust quantizer. Advances in Neural Information Processing Systems 34 (2021), 1074–1085.
- Once-for-all: Train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791 (2019).
- Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).
- Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. In Proceedings of the IEEE/CVF International Conference on computer vision. 12239–12248.
- Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors. Nature Machine Intelligence 3, 8 (jun 2021), 675–686. https://doi.org/10.1038/s42256-021-00356-5
- The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3213–3223.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. 248–255.
- Neural architecture search: A survey. The Journal of Machine Learning Research 20, 1 (2019), 1997–2017.
- FNA++: Fast network adaptation via parameter remapping and architecture search. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 9 (2020), 2990–3004.
- Knowledge distillation: A survey. International Journal of Computer Vision 129 (2021), 1789–1819.
- Single path one-shot neural architecture search with uniform sampling. In Computer Vision–ECCV 2020. 544–560.
- Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342 (2018).
- Fahad Lateef and Yassine Ruichek. 2019. Survey on semantic segmentation using deep learning techniques. Neurocomputing 338 (2019), 321–348.
- Block-wisely supervised neural architecture search with knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1989–1998.
- Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018).
- Deep learning for generic object detection: A survey. International journal of computer vision 128, 2 (2020), 261–318.
- Distilling optimal neural networks: Rapid search in diverse spaces. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12229–12238.
- Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.
- Radar-based Object Classification in ADAS with Hardware-Aware NAS and Input Region Scaling. In 2023 IEEE Radar Conference (RadarConf23). IEEE, 1–6.
- Raghubir Singh and Sukhpal Singh Gill. 2023. Edge AI: a survey. Internet of Things and Cyber-Physical Systems (2023).
- Neural architecture search for energy-efficient always-on audio machine learning. Neural Computing and Applications 35, 16 (2023), 12133–12144.
- Quantization-Aware Neural Architecture Search with Hyperparameter Optimization for Industrial Predictive Maintenance Applications. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–2.
- BOMP-NAS: Bayesian Optimization Mixed Precision NAS. In 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–2.
- Apq: Joint search for network architecture, pruning and quantization policy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2078–2087.
- Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 129–137.
- Weight-sharing neural architecture search: A battle to shrink the optimization gap. ACM Computing Surveys (CSUR) 54, 9 (2021), 1–37.