Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring the Performance and Efficiency of Transformer Models for NLP on Mobile Devices (2306.11426v1)

Published 20 Jun 2023 in cs.LG and cs.CL

Abstract: Deep learning (DL) is characterised by its dynamic nature, with new deep neural network (DNN) architectures and approaches emerging every few years, driving the field's advancement. At the same time, the ever-increasing use of mobile devices (MDs) has resulted in a surge of DNN-based mobile applications. Although traditional architectures, like CNNs and RNNs, have been successfully integrated into MDs, this is not the case for Transformers, a relatively new model family that has achieved new levels of accuracy across AI tasks, but poses significant computational challenges. In this work, we aim to make steps towards bridging this gap by examining the current state of Transformers' on-device execution. To this end, we construct a benchmark of representative models and thoroughly evaluate their performance across MDs with different computational capabilities. Our experimental results show that Transformers are not accelerator-friendly and indicate the need for software and hardware optimisations to achieve efficient deployment.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. M. Xu et al., “A First Look at Deep Learning Apps on Smartphones,” in WWW, 2019.
  2. J. Chen and X. Ran, “Deep Learning With Edge Computing: A Review,” Proc. IEEE, vol. 107, no. 8, 2019.
  3. M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” in ICML, 2019.
  4. J. Devlin et al., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in ACL, 2019.
  5. C. Raffel et al., “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,” J. Mach. Learn. Res., vol. 21, 2020.
  6. Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” arXiv, 2019.
  7. A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” in ICLR, 2021.
  8. L. Lu, C. Liu, J. Li, and Y. Gong, “Exploring Transformers for Large-Scale Speech Recognition,” in Interspeech, 2020.
  9. Y. Feng et al., “Mobius: Fine Tuning Large-Scale Models on Commodity GPU Servers,” in ASPLOS, 2023.
  10. M. Almeida et al., “Smart at what cost?: Characterising Mobile Deep Neural Networks in the wild,” in IMC, 2021.
  11. A. Vaswani et al., “Attention is All you Need,” in NeurIPS, 2017.
  12. M. Dhouibi et al., “Accelerating Deep Neural Networks Implementation: A Survey,” IET Comput. Digit. Tech., vol. 15, no. 2, 2021.
  13. V. J. Reddi et al., “MLPerf Inference Benchmark,” in ISCA, 2020.
  14. Q. Cao et al., “Are Mobile DNN Accelerators Accelerating DNNs?” in EMDL, 2021.
  15. S. S. A. Zaidi et al., “A Survey of Modern Deep Learning based Object Detection Models,” Digital Signal Processing, vol. 126, 2022.
  16. X. Wang et al., “Towards Efficient Vision Transformer Inference: A First Study of Transformers on Mobile Devices,” in HotMobile, 2022.
  17. J. Niu, Y. Wang, Z. Yang, X. Liu, and J. Zhang, “XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation,” arXiv, 2021.
  18. W. Wang et al., “MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers,” in NeurIPS, 2020.
  19. Z. Sun et al., “MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices,” in ACL, 2020.
  20. K. Clark et al., “ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators,” in ICLR, 2020.
  21. V. Sanh et al., “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” in EMC2@NeurIPS, 2019.
  22. A. Radford et al., “Language Models are Unsupervised Multitask Learners,” 2019.
  23. B. Kütükçü et al., “Contention Grading and Adaptive Model Selection for Machine Vision in Embedded Systems,” ACM TECS, 2022.
  24. S. I. Venieris, I. Panopoulos, and I. S. Venieris, “OODIn: An Optimised On-Device Inference Framework for Heterogeneous Mobile Devices,” in SMARTCOMP, 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.