A New Frontier of AI: On-Device AI Training and Personalization (2206.04688v3)
Abstract: Modern consumer electronic devices have started executing deep learning-based intelligence services on devices, not cloud servers, to keep personal data on devices and to reduce network and cloud costs. We find such a trend as the opportunity to personalize intelligence services by updating neural networks with user data without exposing the data out of devices: on-device training. However, the limited resources of devices incurs significant difficulties. We propose a light-weight on-device training framework, NNTrainer, which provides highly memory-efficient neural network training techniques and proactive swapping based on fine-grained execution order analysis for neural networks. Moreover, its optimizations do not sacrifice accuracy and are transparent to training algorithms; thus, prior algorithmic studies may be implemented on top of NNTrainer. The evaluations show that NNTrainer can reduce memory consumption down to 1/20 (saving 95%!) and effectively personalizes intelligence services on devices. NNTrainer is cross-platform and practical open-source software, which is being deployed to millions of mobile devices.
- TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 265–283. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi
- David M Allen. 1971. Mean square error of prediction as a criterion for selecting variables. Technometrics 13, 3 (1971), 469–475.
- Automatic differentiation in machine learning: a survey. Journal of machine learning research 18 (2018).
- Yaroslav Bulatov. 2018. Fitting larger networks into memory. https://medium.com/tensorflow/fitting-larger-networks-into-memory-583e3c758ff9.
- Tinytl: Reduce memory, not parameters for efficient on-device learning. arXiv preprint arXiv:2007.11622 (2020).
- Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174 (2016).
- Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on audio, speech, and language processing 20, 1 (2011), 30–42.
- A continual learning survey: Defying forgetting in classification tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).
- Marat Dukhan. 2019. The indirect convolution algorithm. arXiv preprint arXiv:1907.02129 (2019).
- A fully personalization strategy of E-learning scenarios. Computers in Human Behavior 26, 4 (2010), 581–591.
- Google. 2018. TensorFlow-Lite. https://www.tensorflow.org/lite.
- NNStreamer: Efficient and Agile Development of On-Device AI Systems. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 198–207.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal processing magazine 29, 6 (2012), 82–97.
- Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in neural information processing systems 32 (2019).
- Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2704–2713.
- Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. 675–678.
- Justin Basilico. 2021. Netflix Explains Recommendations and Personalization. http:https://scale.com/blog/Netflix-Recommendation-Personalization-TransformX-Scale-AI-Insights.
- Nikhil Ketkar. 2017. Stochastic gradient descent. In Deep learning with Python. Springer, 113–132.
- Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.
- A review of applications in federated learning. Computers & Industrial Engineering (2020), 106854.
- LightSys: lightweight and efficient CI system for improving integration speed of software. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 1–10.
- SWAM: Revisiting Swap and OOMK for Improving Application Responsiveness on Mobile Devices. In MobiCom 2023 (Annual International Conference On Mobile Computing And Networking), To appear.
- Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends. IEEE Signal Processing Magazine 32, 3 (2015), 35–52.
- Transfer learning using computational intelligence: A survey. Knowledge-Based Systems 80 (2015), 14–23.
- MIT Technical Review. [n. d.]. On-Device AI. https://www.technologyreview.com/hub/ubiquitous-on-device-ai/. (accessed 14 Dec 2021).
- Hesham Mostafa and Xin Wang. 2019. Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization. In International Conference on Machine Learning. PMLR, 4646–4655.
- Mixed precision training. In Proc. 6th Int. Conf. on Learning Representations (ICLR).
- ndevilla. 2017. Iniparser4. https://github.com/ndevilla/iniparser.
- Nikhil R Pal and Sankar K Pal. 1993. A review on image segmentation techniques. Pattern recognition 26, 9 (1993), 1277–1294.
- On the difficulty of training recurrent neural networks. In International conference on machine learning. PMLR, 1310–1318.
- Automatic differentiation in pytorch. In NIPS 2017 Workshop on Autodiff (Long Beach, California, USA). https://openreview.net/forum?id=BJJsrmfCZ
- On the training aspects of deep neural network (DNN) for parametric TTS synthesis. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3829–3833.
- Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–14.
- {{\{{ZeRO-Offload}}\}}: Democratizing {{\{{Billion-Scale}}\}} Model Training. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). 551–564.
- Memory optimization for deep networks. arXiv preprint arXiv:2010.14501 (2020).
- Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4779–4783.
- Continual learning with deep generative replay. arXiv preprint arXiv:1705.08690 (2017).
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
- A survey on deep transfer learning. In International conference on artificial neural networks. Springer, 270–279.
- BranchyNet: Fast inference via early exiting from deep neural networks. In 2016 23rd International Conference on Pattern Recognition (ICPR). 2464–2469. https://doi.org/10.1109/ICPR.2016.7900006
- Sebastian Thrun and Tom M Mitchell. 1995. Lifelong robot learning. Robotics and autonomous systems 15, 1-2 (1995), 25–46.
- Lisa Torrey and Jude Shavlik. 2010. Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI global, 242–264.
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- Paul Voigt and Axel Von dem Bussche. 2017. The eu general data protection regulation (gdpr). A Practical Guide, 1st Ed., Cham: Springer International Publishing 10 (2017), 3152676.
- Ronald J Williams and David Zipser. 1989. A learning algorithm for continually running fully recurrent neural networks. Neural computation 1, 2 (1989), 270–280.
- Deep image: Scaling up image recognition. arXiv preprint arXiv:1501.02876 7, 8 (2015).
- Deep learning for single image super-resolution: A brief review. IEEE Transactions on Multimedia 21, 12 (2019), 3106–3121.
- Improved analysis of clipping algorithms for non-convex optimization. Advances in Neural Information Processing Systems 33 (2020), 15511–15521.
- Zhang Xianyi. 2013. OpenBLAS. http:https://github.com/xianyi/OpenBLAS.