Online Resource Allocation for Edge Intelligence with Colocated Model Retraining and Inference (2405.16029v1)
Abstract: With edge intelligence, AI models are increasingly pushed to the edge to serve ubiquitous users. However, due to the drift of model, data, and task, AI model deployed at the edge suffers from degraded accuracy in the inference serving phase. Model retraining handles such drifts by periodically retraining the model with newly arrived data. When colocating model retraining and model inference serving for the same model on resource-limited edge servers, a fundamental challenge arises in balancing the resource allocation for model retraining and inference, aiming to maximize long-term inference accuracy. This problem is particularly difficult due to the underlying mathematical formulation being time-coupled, non-convex, and NP-hard. To address these challenges, we introduce a lightweight and explainable online approximation algorithm, named ORRIC, designed to optimize resource allocation for adaptively balancing the accuracy of model training and inference. The competitive ratio of ORRIC outperforms that of the traditional Inference-Only paradigm, especially when data drift persists for a sufficiently lengthy time. This highlights the advantages and applicable scenarios of colocating model retraining and inference. Notably, ORRIC can be translated into several heuristic algorithms for different resource environments. Experiments conducted in real scenarios validate the effectiveness of ORRIC.
- S. Niu, J. Wu, Y. Zhang, Y. Chen, S. Zheng, P. Zhao, and M. Tan, “Efficient test-time model adaptation without forgetting,” in International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, vol. 162. PMLR, 2022, pp. 16 888–16 905.
- N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall, M. B. Gotway, and J. Liang, “Convolutional neural networks for medical image analysis: Full training or fine tuning?” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1299–1312, 2016.
- J. Duan, S. Yu, H. L. Tan, H. Zhu, and C. Tan, “A survey of embodied AI: from simulators to research tasks,” IEEE Trans. Emerg. Top. Comput. Intell., vol. 6, no. 2, pp. 230–244, 2022.
- J. Liang, R. He, and T. Tan, “A comprehensive survey on test-time adaptation under distribution shifts,” CoRR, vol. abs/2303.15361, 2023.
- J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang, “Learning under concept drift: A review,” IEEE Trans. Knowl. Data Eng., vol. 31, no. 12, pp. 2346–2363, 2019.
- R. Wu, C. Guo, Y. Su, and K. Q. Weinberger, “Online adaptation to label distribution shift,” in Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, 2021, pp. 11 340–11 351.
- R. Fakoor, J. Mueller, Z. C. Lipton, P. Chaudhari, and A. J. Smola, “Data drift correction via time-varying importance weight estimator,” CoRR, vol. abs/2210.01422, 2022.
- R. Bhardwaj, Z. Xia, G. Ananthanarayanan, J. Jiang, Y. Shu, N. Karianakis, K. Hsieh, P. Bahl, and I. Stoica, “Ekya: Continuous learning of video analytics models on edge compute servers,” in 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2022, Renton, WA, USA, April 4-6, 2022. USENIX Association, 2022, pp. 119–135.
- M. Khani, G. Ananthanarayanan, K. Hsieh, J. Jiang, R. Netravali, Y. Shu, M. Alizadeh, and V. Bahl, “RECL: responsive resource-efficient continuous learning for video analytics,” in 20th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2023, Boston, MA, April 17-19, 2023. USENIX Association, 2023, pp. 917–932.
- L. Wang, K. Lu, N. Zhang, X. Qu, J. Wang, J. Wan, G. Li, and J. Xiao, “Shoggoth: Towards efficient edge-cloud collaborative real-time video inference via adaptive online learning,” in 60th ACM/IEEE Design Automation Conference, DAC 2023, San Francisco, CA, USA, July 9-13, 2023. IEEE, 2023, pp. 1–6.
- S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding,” in 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2016.
- R. Dong, Z. Tan, M. Wu, L. Zhang, and K. Ma, “Finding the task-optimal low-bit sub-distribution in deep neural networks,” in International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, vol. 162. PMLR, 2022, pp. 5343–5359.
- L. Jia, Z. Zhou, F. Xu, and H. Jin, “Cost-efficient continuous edge learning for artificial intelligence of things,” IEEE Internet of Things Journal, vol. 9, no. 10, pp. 7325–7337, 2022.
- C. Zhao, F. Mi, X. Wu, K. Jiang, L. Khan, and F. Chen, “Adaptive fairness-aware online meta-learning for changing environments,” in KDD ’22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022, 2022, pp. 2565–2575.
- P. Yang, F. Lyu, W. Wu, N. Zhang, L. Yu, and X. S. Shen, “Edge coordinated query configuration for low-latency and accurate video analytics,” IEEE Trans. Ind. Informatics, vol. 16, no. 7, pp. 4855–4864, 2020.
- K. Zhao, Z. Zhou, X. Chen, R. Zhou, X. Zhang, S. Yu, and D. Wu, “Edgeadaptor: Online configuration adaption, model selection and resource provisioning for edge dnn inference serving at scale,” IEEE Transactions on Mobile Computing, pp. 1–16, 2022.
- E. Li, L. Zeng, Z. Zhou, and X. Chen, “Edge AI: on-demand accelerating deep neural network inference via edge computing,” IEEE Trans. Wirel. Commun., vol. 19, no. 1, pp. 447–457, 2020.
- M. K. Shirkoohi, P. Hamadanian, A. Nasr-Esfahany, and M. Alizadeh, “Real-time video inference on edge devices via adaptive model streaming,” in 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. IEEE, 2021, pp. 4552–4562.
- S. G. Patil, P. Jain, P. Dutta, I. Stoica, and J. Gonzalez, “POET: training neural networks on tiny devices with integrated rematerialization and paging,” in International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, vol. 162. PMLR, 2022, pp. 17 573–17 583.
- D. Xu, M. Xu, Q. Wang, S. Wang, Y. Ma, K. Huang, G. Huang, X. Jin, and X. Liu, “Mandheling: mixed-precision on-device DNN training with DSP offloading,” in ACM MobiCom ’22: The 28th Annual International Conference on Mobile Computing and Networking, Sydney, NSW, Australia, October 17 - 21, 2022. ACM, 2022, pp. 214–227.
- C. Lv, C. Niu, R. Gu, X. Jiang, Z. Wang, B. Liu, Z. Wu, Q. Yao, C. Huang, P. Huang, T. Huang, H. Shu, J. Song, B. Zou, P. Lan, G. Xu, F. Wu, S. Tang, F. Wu, and G. Chen, “Walle: An End-to-End, General-Purpose, and Large-Scale production system for Device-Cloud collaborative machine learning,” in 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). Carlsbad, CA: USENIX Association, Jul. 2022, pp. 249–265.
- J. J. Moon, P. Kapoor, J. Lee, M. Ham, and H. S. Lee, “Nntrainer: Light-weight on-device training framework,” CoRR, vol. abs/2206.04688, 2022.
- M. Abadi, “Tensorflow lite,” 2023, https://www.tensorflow.org/lite [Accessed: (Jul. 28, 2023)].
- A. Paszke, “Pytorch mobile,” 2023, https://pytorch.org/mobile/ [Accessed: (Jul. 28, 2023)].
- T. Domhan, J. T. Springenberg, and F. Hutter, “Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves,” in Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015. AAAI Press, 2015, pp. 3460–3468.
- T. Lattimore and A. György, “Improved regret for zeroth-order stochastic convex bandits,” in Conference on Learning Theory, COLT 2021, 15-19 August 2021, Boulder, Colorado, USA, ser. Proceedings of Machine Learning Research, vol. 134. PMLR, 2021, pp. 2938–2964.
- P. Zhao, G. Wang, L. Zhang, and Z. Zhou, “Bandit convex optimization in non-stationary environments,” J. Mach. Learn. Res., vol. 22, pp. 125:1–125:45, 2021.
- A. Gupta, R. Krishnaswamy, and K. Pruhs, “Online primal-dual for non-linear optimization with applications to speed scaling,” in Approximation and Online Algorithms - 10th International Workshop, WAOA 2012, Ljubljana, Slovenia, September 13-14, 2012, Revised Selected Papers, vol. 7846. Springer, 2012, pp. 173–186.
- A. Simonetto, E. Dall’Anese, S. Paternain, G. Leus, and G. B. Giannakis, “Time-varying convex optimization: Time-structured algorithms and applications,” Proc. IEEE, vol. 108, no. 11, pp. 2032–2048, 2020.
- D. Hendrycks and T. G. Dietterich, “Benchmarking neural network robustness to common corruptions and perturbations,” in 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
- Q. Wang, O. Fink, L. Van Gool, and D. Dai, “Continual test-time domain adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 7201–7211.