Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A New Frontier of AI: On-Device AI Training and Personalization (2206.04688v3)

Published 9 Jun 2022 in cs.LG

Abstract: Modern consumer electronic devices have started executing deep learning-based intelligence services on devices, not cloud servers, to keep personal data on devices and to reduce network and cloud costs. We find such a trend as the opportunity to personalize intelligence services by updating neural networks with user data without exposing the data out of devices: on-device training. However, the limited resources of devices incurs significant difficulties. We propose a light-weight on-device training framework, NNTrainer, which provides highly memory-efficient neural network training techniques and proactive swapping based on fine-grained execution order analysis for neural networks. Moreover, its optimizations do not sacrifice accuracy and are transparent to training algorithms; thus, prior algorithmic studies may be implemented on top of NNTrainer. The evaluations show that NNTrainer can reduce memory consumption down to 1/20 (saving 95%!) and effectively personalizes intelligence services on devices. NNTrainer is cross-platform and practical open-source software, which is being deployed to millions of mobile devices.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 265–283. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi
  2. David M Allen. 1971. Mean square error of prediction as a criterion for selecting variables. Technometrics 13, 3 (1971), 469–475.
  3. Automatic differentiation in machine learning: a survey. Journal of machine learning research 18 (2018).
  4. Yaroslav Bulatov. 2018. Fitting larger networks into memory. https://medium.com/tensorflow/fitting-larger-networks-into-memory-583e3c758ff9.
  5. Tinytl: Reduce memory, not parameters for efficient on-device learning. arXiv preprint arXiv:2007.11622 (2020).
  6. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174 (2016).
  7. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on audio, speech, and language processing 20, 1 (2011), 30–42.
  8. A continual learning survey: Defying forgetting in classification tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).
  9. Marat Dukhan. 2019. The indirect convolution algorithm. arXiv preprint arXiv:1907.02129 (2019).
  10. A fully personalization strategy of E-learning scenarios. Computers in Human Behavior 26, 4 (2010), 581–591.
  11. Google. 2018. TensorFlow-Lite. https://www.tensorflow.org/lite.
  12. NNStreamer: Efficient and Agile Development of On-Device AI Systems. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 198–207.
  13. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
  14. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal processing magazine 29, 6 (2012), 82–97.
  15. Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems 32 (2019).
  16. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2704–2713.
  17. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. 675–678.
  18. Justin Basilico. 2021. Netflix Explains Recommendations and Personalization. http:https://scale.com/blog/Netflix-Recommendation-Personalization-TransformX-Scale-AI-Insights.
  19. Nikhil Ketkar. 2017. Stochastic gradient descent. In Deep learning with Python. Springer, 113–132.
  20. Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.
  21. A review of applications in federated learning. Computers & Industrial Engineering (2020), 106854.
  22. LightSys: lightweight and efficient CI system for improving integration speed of software. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 1–10.
  23. SWAM: Revisiting Swap and OOMK for Improving Application Responsiveness on Mobile Devices. In MobiCom 2023 (Annual International Conference On Mobile Computing And Networking), To appear.
  24. Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends. IEEE Signal Processing Magazine 32, 3 (2015), 35–52.
  25. Transfer learning using computational intelligence: A survey. Knowledge-Based Systems 80 (2015), 14–23.
  26. MIT Technical Review. [n. d.]. On-Device AI. https://www.technologyreview.com/hub/ubiquitous-on-device-ai/. (accessed 14 Dec 2021).
  27. Hesham Mostafa and Xin Wang. 2019. Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization. In International Conference on Machine Learning. PMLR, 4646–4655.
  28. Mixed precision training. In Proc. 6th Int. Conf. on Learning Representations (ICLR).
  29. ndevilla. 2017. Iniparser4. https://github.com/ndevilla/iniparser.
  30. Nikhil R Pal and Sankar K Pal. 1993. A review on image segmentation techniques. Pattern recognition 26, 9 (1993), 1277–1294.
  31. On the difficulty of training recurrent neural networks. In International conference on machine learning. PMLR, 1310–1318.
  32. Automatic differentiation in pytorch. In NIPS 2017 Workshop on Autodiff (Long Beach, California, USA). https://openreview.net/forum?id=BJJsrmfCZ
  33. On the training aspects of deep neural network (DNN) for parametric TTS synthesis. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3829–3833.
  34. Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–14.
  35. {{\{{ZeRO-Offload}}\}}: Democratizing {{\{{Billion-Scale}}\}} Model Training. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). 551–564.
  36. Memory optimization for deep networks. arXiv preprint arXiv:2010.14501 (2020).
  37. Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4779–4783.
  38. Continual learning with deep generative replay. arXiv preprint arXiv:1705.08690 (2017).
  39. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
  40. A survey on deep transfer learning. In International conference on artificial neural networks. Springer, 270–279.
  41. BranchyNet: Fast inference via early exiting from deep neural networks. In 2016 23rd International Conference on Pattern Recognition (ICPR). 2464–2469. https://doi.org/10.1109/ICPR.2016.7900006
  42. Sebastian Thrun and Tom M Mitchell. 1995. Lifelong robot learning. Robotics and autonomous systems 15, 1-2 (1995), 25–46.
  43. Lisa Torrey and Jude Shavlik. 2010. Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI global, 242–264.
  44. Attention is all you need. Advances in neural information processing systems 30 (2017).
  45. Paul Voigt and Axel Von dem Bussche. 2017. The eu general data protection regulation (gdpr). A Practical Guide, 1st Ed., Cham: Springer International Publishing 10 (2017), 3152676.
  46. Ronald J Williams and David Zipser. 1989. A learning algorithm for continually running fully recurrent neural networks. Neural computation 1, 2 (1989), 270–280.
  47. Deep image: Scaling up image recognition. arXiv preprint arXiv:1501.02876 7, 8 (2015).
  48. Deep learning for single image super-resolution: A brief review. IEEE Transactions on Multimedia 21, 12 (2019), 3106–3121.
  49. Improved analysis of clipping algorithms for non-convex optimization. Advances in Neural Information Processing Systems 33 (2020), 15511–15521.
  50. Zhang Xianyi. 2013. OpenBLAS. http:https://github.com/xianyi/OpenBLAS.
Citations (2)

Summary

We haven't generated a summary for this paper yet.