TrimCaching: Parameter-sharing Edge Caching for AI Model Downloading (2404.14204v2)
Abstract: Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observation that a wide range of AI models, such as convolutional neural networks or LLMs, can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. To this end, we formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-edge wireless networks by balancing the fundamental tradeoff between storage efficiency and service latency. We show that the formulated problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To overcome this challenge, we study an important special case, where a small fixed number of parameter blocks are shared across models, which often holds in practice. In such a case, a polynomial-time algorithm with $\left(1-\epsilon\right)/2$-approximation guarantee is developed. Subsequently, we address the original problem for the general case by developing a greedy algorithm. Simulation results demonstrate that the proposed TrimCaching framework significantly improves the cache hit ratio compared with state-of-the-art content caching without exploiting shared parameters in AI models.
- G. Qu, Z. Lin, F. Liu, X. Chen, and K. Huang, “Trimcaching: Parameter-sharing AI model caching in wireless edge networks,” in Proc. IEEE Int. Conf. Distrib. Comput. Syst. (ICDCS), Jersey City, NJ, USA, Jul. 2024, to be published.
- Y. Deng, X. Chen, G. Zhu, Y. Fang, Z. Chen, and X. Deng, “Actions at the edge: Jointly optimizing the resources in multi-access edge computing,” IEEE Wireless Commun., vol. 29, no. 2, pp. 192–198, Apr. 2022.
- Z. Lin, G. Zhu, Y. Deng, X. Chen, Y. Gao, K. Huang, and Y. Fang, “Efficient parallel split learning over resource-constrained wireless edge networks,” 2023. [Online]. Available: https://arxiv.org/abs/2303.15991
- S. Lyu, Z. Lin, G. Qu, X. Chen, X. Huang, and P. Li, “Optimal resource allocation for U-shaped parallel split learning,” 2023. [Online]. Available: https://arxiv.org/abs/2308.08896
- Z. Lin, G. Qu, W. Wei, X. Chen, and K. K. Leung, “AdaptSFL: Adaptive split federated learning in resource-constrained edge networks,” 2024. [Online]. Available: https://arxiv.org/abs/2403.13101
- K. Muhammad, A. Ullah, J. Lloret, J. D. Ser, and V. H. C. de Albuquerque, “Deep learning for safe autonomous driving: Current challenges and future directions,” IEEE Trans. Intell. Transp. Syst., vol. 22, no. 7, pp. 4316–4336, Jul. 2021.
- S. Hu, Z. Fang, Y. Deng, X. Chen, and Y. Fang, “Collaborative perception for connected and autonomous driving: Challenges, possible solutions and opportunities,” 2024. [Online]. Available: https://arxiv.org/abs/2401.01544
- X. Chen, Y. Deng, H. Ding, G. Qu, H. Zhang, P. Li, and Y. Fang, “Vehicle as a service (VaaS): Leverage vehicles to build service networks and capabilities for smart cities,” 2023. [Online]. Available: https://arxiv.org/abs/2304.11397
- Q. Wu, X. Chen, Z. Zhou, and J. Zhang, “Fedhome: Cloud-edge based personalized federated learning for in-home health monitoring,” IEEE Trans. Mobile Comput., vol. 21, no. 8, pp. 2818–2832, Aug. 2022.
- D. C. Nguyen, Q.-V. Pham, P. N. Pathirana, M. Ding, A. Seneviratne, Z. Lin, O. Dobre, and W.-J. Hwang, “Federated learning for smart healthcare: A survey,” ACM Comput. Surv., vol. 55, no. 3, pp. 1–37, Feb. 2022.
- 3GPP, “3rd generation partnership project; Technical specification group services and system aspects; Study on traffic characteristics and performance requirements for AI/ML model transfer in 5GS; (Release 18),” 3rd Generation Partnership Project (3GPP), Technical Specification (TS) 22.874, Dec. 2021, version 18.2.0.
- A. Padmanabhan, N. Agarwal, A. Iyer, G. Ananthanarayanan, Y. Shu, N. Karianakis, G. H. Xu, and R. Netravali, “Gemel: Model merging for memory-efficient, real-time video analytics at the edge,” 2022. [Online]. Available: https://arxiv.org/abs/2201.07705
- TenserFlow, “Transfer learning & fine-tuning,” 2023. [Online]. Available: https://www.tensorflow.org/guide/keras/transfer_learning#introduction
- F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, and Q. He, “A comprehensive survey on transfer learning,” Proc. IEEE, vol. 109, no. 1, pp. 43–76, Jan. 2020.
- Y. Guo, H. Shi, A. Kumar, K. Grauman, T. Rosing, and R. Feris, “Spottune: Transfer learning through adaptive fine-tuning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Long Beach, CA, USA, June 2019.
- E. J. Hu, yelong shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” in Proc. Int. Conf. Learn. Represent. (ICLR), Apr. 2022.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, Jun. 2016, pp. 770–778.
- A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” Apr. 2009.
- J. Novikova, O. Dusek, and V. Rieser, “The E2E dataset: New challenges for end-to-end generation,” 2017. [Online]. Available: http://arxiv.org/abs/1706.09254
- C. Gardent, A. Shimorina, S. Narayan, and L. Perez-Beltrachini, “Creating training corpora for NLG micro-planners,” in Proc. Annu. Meet. Assoc. Comput. Linguist. (ACL), Vancouver, Canada, Jul. 2017, pp. 179–188.
- L. Nan, D. Radev, R. Zhang, A. Rau, A. Sivaprasad, C. Hsieh, X. Tang, A. Vyas, N. Verma, P. Krishna, Y. Liu, N. Irwanto, J. Pan, F. Rahman, A. Zaidi, M. Mutuma, Y. Tarabar, A. Gupta, T. Yu, Y. C. Tan, X. V. Lin, C. Xiong, R. Socher, and N. F. Rajani, “Dart: Open-domain structured data record to text generation,” 2021. [Online]. Available: https://arxiv.org/abs/2007.02871
- K. Shanmugam, N. Golrezaei, A. G. Dimakis, A. F. Molisch, and G. Caire, “Femtocaching: Wireless content delivery through distributed caching helpers,” IEEE Trans. Inf. Theory, vol. 59, no. 12, pp. 8402–8413, Dec. 2013.
- K. Poularakis and L. Tassiulas, “On the complexity of optimal content placement in hierarchical caching networks,” IEEE Trans. Commun., vol. 64, no. 5, pp. 2092–2103, Mar. 2016.
- S. Zhang, P. He, K. Suto, P. Yang, L. Zhao, and X. Shen, “Cooperative edge caching in user-centric clustered mobile networks,” IEEE Trans. Mobile Comput., vol. 17, no. 8, pp. 1791–1805, Aug. 2018.
- M. Dehghan, B. Jiang, A. Seetharam, T. He, T. Salonidis, J. Kurose, D. Towsley, and R. Sitaraman, “On the complexity of optimal request routing and content caching in heterogeneous cache networks,” IEEE/ACM Trans. Netw., vol. 25, no. 3, pp. 1635–1648, Jun. 2017.
- T. X. Vu, S. Chatzinotas, and B. Ottersten, “Edge-caching wireless networks: Performance analysis and optimization,” IEEE Trans. Wireless Commun., vol. 17, no. 4, pp. 2827–2839, Apr. 2018.
- D. Liu, B. Chen, C. Yang, and A. F. Molisch, “Caching at the wireless edge: Design aspects, challenges, and future directions,” IEEE Commun. Mag., vol. 54, no. 9, pp. 22–28, Sep. 2016.
- E. Zeydan, E. Bastug, M. Bennis, M. A. Kader, I. A. Karatepe, A. S. Er, and M. Debbah, “Big data caching for networking: Moving from cloud to edge,” IEEE Commun. Mag., vol. 54, no. 9, pp. 36–42, Sep. 2016.
- Y. Fu, Y. Zhang, Q. Zhu, H.-n. Dai, M. Li, and T. Q. S. Quek, “A new vision of wireless edge caching networks (WECNs): Issues, technologies, and open research trends,” IEEE Netw., pp. 1–9, early access, 2023.
- F. Gabry, V. Bioglio, and I. Land, “On energy-efficient edge caching in heterogeneous networks,” IEEE J. Sel. Areas Commun., vol. 34, no. 12, pp. 3288–3298, Dec. 2016.
- H. Wu, J. Chen, W. Xu, N. Cheng, W. Shi, L. Wang, and X. Shen, “Delay-minimized edge caching in heterogeneous vehicular networks: A matching-based approach,” IEEE Trans. Wireless Commun., vol. 19, no. 10, pp. 6409–6424, Oct. 2020.
- T. X. Vu, L. Lei, S. Vuppala, A. Kalantari, S. Chatzinotas, and B. Ottersten, “Latency minimization for content delivery networks with wireless edge caching,” in Proc. IEEE Int. Conf.Commun. (ICC), Kansas City, MO, USA, May 2018, pp. 1–6.
- D. Xu, T. Li, Y. Li, X. Su, S. Tarkoma, and P. Hui, “Edge intelligence: Architectures, challenges, and applications,” 2020. [Online]. Available: https://arxiv.org/abs/2003.12172
- Z. Lin, G. Qu, Q. Chen, X. Chen, Z. Chen, and K. Huang, “Pushing large language models to the 6G edge: Vision, challenges, and opportunities,” 2023. [Online]. Available: https://arxiv.org/abs/2309.16739
- M. Xu, D. Niyato, H. Zhang, J. Kang, Z. Xiong, S. Mao, and Z. Han, “Cached model-as-a-resource: Provisioning large language model agents for edge intelligence in space-air-ground integrated networks,” 2024. [Online]. Available: https://arxiv.org/abs/2403.05826
- J. Tang, G. Wu, M. M. Jalalzai, L. Wang, B. Zhang, and Y. Zhou, “Energy-optimal DNN model placement in UAV-enabled edge computing networks,” Digit. Commun. Netw., 2023.
- Z. Lin, G. Qu, X. Chen, and K. Huang, “Split learning in 6G edge networks,” 2023. [Online]. Available: https://arxiv.org/abs/2306.12194
- N. Hudson, H. Khamfroush, and D. E. Lucani, “QoS-aware placement of deep learning services on the edge with multiple service implementations,” in Proc. Int. Conf. on Comput. Commun. and Netw. (ICCCN), Athens, Greece, Jul. 2021, pp. 1–8.
- K. Huang, H. Wu, Z. Liu, and X. Qi, “In-situ model downloading to realize versatile edge AI in 6G mobile networks,” IEEE Commun. Mag., vol. 30, no. 3, pp. 96–102, 2023.
- H. Wu, Q. Zeng, and K. Huang, “Efficient multiuser AI downloading via reusable knowledge broadcasting,” 2023. [Online]. Available: https://arxiv.org/abs/2307.15316
- P. Sermpezis, T. Giannakas, T. Spyropoulos, and L. Vigneri, “Soft cache hits: Improving performance through recommendation and delivery of related content,” IEEE J. Sel. Areas Commun., vol. 36, no. 6, pp. 1300–1313, Jun. 2018.
- H. Kellerer, R. Sarto Basso, and V. A. Strusevich, “Approximability issues for unconstrained and constrained maximization of half-product related functions,” Theor. Comput. Sci., vol. 659, pp. 64–71, Jan. 2017.
- M. Dawande, J. Kalagnanam, P. Keskinocak, F. S. Salman, and R. Ravi, “Approximation algorithms for the multiple knapsack problem with assignment restrictions,” J. Comb. Optim., vol. 4, pp. 171–186, 2000.
- C. Chekuri and S. Khanna, “A polynomial time approximation scheme for the multiple knapsack problem,” SIAM J. Comput., vol. 35, no. 3, pp. 713–728, 2005.
- R. K. Iyer and J. A. Bilmes, “Submodular optimization with submodular cover and submodular knapsack constraints,” in Proc. Adv. Neural Inform. Process. Syst. (NeurIPS), Stateline, NV, USA, Dec. 2013, pp. 1–9.
- R. Iyer, S. Jegelka, and J. Bilmes, “Fast semidifferential-based submodular function optimization,” in Proc. Int. Conf. Mach. Learn. (ICML), Atlanta, USA, Jun. 2013, pp. 855–863.
- GSMA, “Mobile backhaul: An overview,” 2019. [Online]. Available: https://www.gsma.com/futurenetworks/wiki/mobile-backhaul-an-overview/
- A. L. Rezaabad, H. Beyranvand, J. A. Salehi, and M. Maier, “Ultra-dense 5G small cell deployment for fiber and wireless backhaul-aware infrastructures,” IEEE Trans. Veh. Technol., vol. 67, no. 12, pp. 12 231–12 243, Dec. 2018.
- G. K. Zipf, “Relative frequency as a determinant of phonetic change,” Harvard Studies in Classical Philology, vol. 40, pp. 1–95, 1929.
- Guanqiao Qu (9 papers)
- Zheng Lin (104 papers)
- Qian Chen (264 papers)
- Jian Li (667 papers)
- Fangming Liu (33 papers)
- Xianhao Chen (50 papers)
- Kaibin Huang (186 papers)