Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SimQ-NAS: Simultaneous Quantization Policy and Neural Architecture Search (2312.13301v1)

Published 19 Dec 2023 in cs.LG and cs.AI

Abstract: Recent one-shot Neural Architecture Search algorithms rely on training a hardware-agnostic super-network tailored to a specific task and then extracting efficient sub-networks for different hardware platforms. Popular approaches separate the training of super-networks from the search for sub-networks, often employing predictors to alleviate the computational overhead associated with search. Additionally, certain methods also incorporate the quantization policy within the search space. However, while the quantization policy search for convolutional neural networks is well studied, the extension of these methods to transformers and especially foundation models remains under-explored. In this paper, we demonstrate that by using multi-objective search algorithms paired with lightly trained predictors, we can efficiently search for both the sub-network architecture and the corresponding quantization policy and outperform their respective baselines across different performance objectives such as accuracy, model size, and latency. Specifically, we demonstrate that our approach performs well across both uni-modal (ViT and BERT) and multi-modal (BEiT-3) transformer-based architectures as well as convolutional architectures (ResNet). For certain networks, we demonstrate an improvement of up to $4.80x$ and $3.44x$ for latency and model size respectively, without degradation in accuracy compared to the fully quantized INT8 baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. A comprehensive survey on hardware-aware neural architecture search. arXiv preprint arXiv:2101.09336, 2021.
  2. Neural architecture search for transformers: A survey. IEEE Access, 10:108374–108412, 2022.
  3. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019.
  4. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8697–8710, 2018.
  5. The evolved transformer. In International conference on machine learning, pages 5877–5886. PMLR, 2019.
  6. Ptq4vit: Post-training quantization for vision transformers with twin uniform quantization. In European Conference on Computer Vision, pages 191–207. Springer, 2022.
  7. Once-for-all: Train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791, 2019.
  8. Apq: Joint search for network architecture, pruning and quantization policy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2078–2087, 2020.
  9. Instatune: Instantaneous neural architecture search during fine-tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pages 1523–1527, October 2023.
  10. A hardware-aware framework for accelerating neural architecture search across modalities. arXiv preprint arXiv:2205.10358, 2022.
  11. Bitwidth-adaptive quantization-aware neural network training: A meta-learning approach. In European Conference on Computer Vision, pages 208–224. Springer, 2022.
  12. Fliqs: One-shot mixed-precision floating-point and integer quantization search. arXiv preprint arXiv:2308.03290, 2023.
  13. Mixed precision quantization of convnets via differentiable neural architecture search. arXiv preprint arXiv:1812.00090, 2018.
  14. Rethinking differentiable search for mixed-precision neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2349–2358, 2020.
  15. Automatic mixed-precision quantization search of bert. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-2021. International Joint Conferences on Artificial Intelligence Organization, August 2021.
  16. Batchquant: Quantized-for-all architecture search with robust quantizer. Advances in Neural Information Processing Systems, 34:1074–1085, 2021.
  17. Releq: A reinforcement learning approach for deep quantization of neural networks. arXiv preprint arXiv:1811.01704, 2018.
  18. Hawq: Hessian aware quantization of neural networks with mixed-precision. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 293–302, 2019.
  19. Post-training quantization for vision transformer. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 28092–28103. Curran Associates, Inc., 2021.
  20. A study of bfloat16 for deep learning training. arXiv preprint arXiv:1905.12322, 2019.
  21. Image as a foreign language: Beit pretraining for vision and vision-language tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19175–19186, 2023.
  22. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  23. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  24. Deep residual learning for image recognition. corr abs/1512.03385 (2015), 2015.
  25. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015.
  26. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA, October 2013. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sharath Nittur Sridhar (16 papers)
  2. Maciej Szankin (7 papers)
  3. Fang Chen (97 papers)
  4. Sairam Sundaresan (17 papers)
  5. Anthony Sarah (10 papers)

Summary

We haven't generated a summary for this paper yet.