Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cached Model-as-a-Resource: Provisioning Large Language Model Agents for Edge Intelligence in Space-air-ground Integrated Networks (2403.05826v2)

Published 9 Mar 2024 in cs.NI and eess.SP

Abstract: Edge intelligence in space-air-ground integrated networks (SAGINs) can enable worldwide network coverage beyond geographical limitations for users to access ubiquitous and low-latency intelligence services. Facing global coverage and complex environments in SAGINs, edge intelligence can provision approximate LLMs agents for users via edge servers at ground base stations (BSs) or cloud data centers relayed by satellites. As LLMs with billions of parameters are pre-trained on vast datasets, LLM agents have few-shot learning capabilities, e.g., chain-of-thought (CoT) prompting for complex tasks, which raises a new trade-off between resource consumption and performance in SAGINs. In this paper, we propose a joint caching and inference framework for edge intelligence to provision sustainable and ubiquitous LLM agents in SAGINs. We introduce "cached model-as-a-resource" for offering LLMs with limited context windows and propose a novel optimization framework, i.e., joint model caching and inference, to utilize cached model resources for provisioning LLM agent services along with communication, computing, and storage resources. We design "age of thought" (AoT) considering the CoT prompting of LLMs, and propose a least AoT cached model replacement algorithm for optimizing the provisioning cost. We propose a deep Q-network-based modified second-bid (DQMSB) auction to incentivize network operators, which can enhance allocation efficiency by 23% while guaranteeing strategy-proofness and free from adverse selection.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. J. Liu, Y. Shi, Z. M. Fadlullah, and N. Kato, “Space-air-ground integrated network: A survey,” IEEE Commun. Surv. Tutor., vol. 20, no. 4, pp. 2714–2741, May 2018.
  2. Q. Tang, Z. Fei, B. Li, and Z. Han, “Computation offloading in leo satellite networks with hybrid cloud and edge computing,” IEEE Internet Things J., vol. 8, no. 11, pp. 9164–9176, Feb. 2021.
  3. T. Wei, W. Feng, Y. Chen, C.-X. Wang, N. Ge, and J. Lu, “Hybrid satellite-terrestrial communication networks for the maritime internet of things: Key technologies, opportunities, and challenges,” IEEE Internet Things J., vol. 8, no. 11, pp. 8910–8934, Feb. 2021.
  4. B. Min, H. Ross, E. Sulem, A. P. B. Veyseh, T. H. Nguyen, O. Sainz, E. Agirre, I. Heintz, and D. Roth, “Recent advances in natural language processing via large pre-trained language models: A survey,” ACM Comput. Surv., vol. 56, no. 2, pp. 1–40, Sep. 2023.
  5. M. Xu, D. Niyato, H. Zhang, J. Kang, Z. Xiong, S. Mao, and Z. Han, “Sparks of generative pretrained transformers in edge intelligence for the metaverse: Caching and inference for mobile artificial intelligence-generated content services,” IEEE Veh. Technol. Mag., vol. 18, no. 4, pp. 35–44, Dec. 2023.
  6. M. Xu, N. Dusit, J. Kang, Z. Xiong, S. Mao, Z. Han, D. I. Kim, and K. B. Letaief, “When large language model agents meet 6g networks: Perception, grounding, and alignment,” arXiv preprint arXiv:2401.07764, 2024.
  7. Z. Xi, W. Chen, X. Guo, W. He, Y. Ding, B. Hong, M. Zhang, J. Wang, S. Jin, E. Zhou et al., “The rise and potential of large language model based agents: A survey,” arXiv preprint arXiv:2309.07864, 2023.
  8. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Proceeding of Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901, Dec. 2020.
  9. M. Xu, H. Du, D. Niyato, J. Kang, Z. Xiong, S. Mao, Z. Han, A. Jamalipour, D. I. Kim, X. Shen et al., “Unleashing the power of edge-cloud generative AI in mobile networks: A survey of AIGC services,” IEEE Commun. Surv. Tutor., pp. 1–1, 2024.
  10. Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edge intelligence: Paving the last mile of artificial intelligence with edge computing,” Proc. IEEE, vol. 107, no. 8, pp. 1738–1762, Jun. 2019.
  11. C. Packer, V. Fang, S. G. Patil, K. Lin, S. Wooders, and J. E. Gonzalez, “Memgpt: Towards llms as operating systems,” arXiv preprint arXiv:2310.08560, 2023.
  12. J. Yang, “Longqlora: Efficient and effective method to extend context length of large language models,” arXiv preprint arXiv:2311.04879, 2023.
  13. N. Arnosti, M. Beck, and P. Milgrom, “Adverse selection and auction design for internet display advertising,” Am. Econ. Rev., vol. 106, no. 10, pp. 2852–2866, Oct. 2016.
  14. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015.
  15. X. Xu, Q. Wang, Y. Hou, and S. Wang, “AI-SPACE: A cloud-edge aggregated artificial intelligent architecture for tiansuan constellation-assisted space-terrestrial integrated networks,” IEEE Netw., vol. 37, no. 2, pp. 22–28, Sep. 2023.
  16. X. Hou, J. Wang, Z. Fang, Y. Ren, K.-C. Chen, and L. Hanzo, “Edge intelligence for mission-critical 6g services in space-air-ground integrated networks,” IEEE Netw., vol. 36, no. 2, pp. 181–189, Apr. 2022.
  17. P. Qin, M. Wang, X. Zhao, and S. Geng, “Content service oriented resource allocation for space–air–ground integrated 6g networks: A three-sided cyclic matching approach,” IEEE Internet Things J., vol. 10, no. 1, pp. 828–839, 2022.
  18. Z. Chen, Z. Zhang, and Z. Yang, “Big AI models for 6G wireless networks: Opportunities, challenges, and research directions,” arXiv preprint arXiv:2308.06250, 2023.
  19. Y. Shen, J. Shao, X. Zhang, Z. Lin, H. Pan, D. Li, J. Zhang, and K. B. Letaief, “Large language models empowered autonomous edge AI for connected intelligence,” IEEE Commun. Mag., pp. 1–7, Jan. 2024.
  20. Y. Du, S. C. Liew, K. Chen, and Y. Shao, “The power of large language models for wireless communication system development: A case study on FPGA platforms,” arXiv preprint arXiv:2307.07319, 2023.
  21. H. Cui, Y. Du, Q. Yang, Y. Shao, and S. C. Liew, “Llmind: Orchestrating ai and iot with llms for complex task execution,” arXiv preprint arXiv:2312.09007, 2023.
  22. F. Jiang, L. Dong, Y. Peng, K. Wang, K. Yang, C. Pan, D. Niyato, and O. A. Dobre, “Large language model enhanced multi-agent systems for 6g communications,” arXiv preprint arXiv:2312.07850, 2023.
  23. L. Dong, F. Jiang, Y. Peng, K. Wang, K. Yang, C. Pan, and R. Schober, “Lambo: Large language model empowered edge intelligence,” arXiv preprint arXiv:2308.15078, 2023.
  24. Z. Lin, G. Qu, Q. Chen, X. Chen, Z. Chen, and K. Huang, “Pushing large language models to the 6g edge: Vision, challenges, and opportunities,” arXiv preprint arXiv:2309.16739, 2023.
  25. J. Du, C. Jiang, H. Zhang, Y. Ren, and M. Guizani, “Auction design and analysis for sdn-based traffic offloading in hybrid satellite-terrestrial networks,” IEEE J. Sel. Areas Commun., vol. 36, no. 10, pp. 2202–2217, Sep. 2018.
  26. Q. Chen, W. Meng, S. Han, and C. Li, “Service-oriented fair resource allocation and auction for civil aircrafts augmented space-air-ground integrated networks,” IEEE Trans. Veh. Technol., vol. 69, no. 11, pp. 13 658–13 672, Sep. 2020.
  27. N. Yang, D. Guo, Y. Jiao, G. Ding, and T. Qu, “Lightweight blockchain-based secure spectrum sharing in space-air-ground integrated iot network,” IEEE Internet Things J., vol. 10, no. 23, pp. 20 511–20 527, Jun. 2023.
  28. R. Deng, B. Di, S. Chen, S. Sun, and L. Song, “Ultra-dense leo satellite offloading for terrestrial networks: How much to pay the satellite operator?” IEEE Trans. Wirel. Commun., vol. 19, no. 10, pp. 6240–6254, Jun. 2020.
  29. N. Cheng, F. Lyu, W. Quan, C. Zhou, H. He, W. Shi, and X. Shen, “Space/aerial-assisted computing offloading for iot applications: A learning-based approach,” IEEE J. Sel. Areas Commun., vol. 37, no. 5, pp. 1117–1129, Mar. 2019.
  30. Z. Zhang, Y. Yao, A. Zhang, X. Tang, X. Ma, Z. He, Y. Wang, M. Gerstein, R. Wang, G. Liu et al., “Igniting language intelligence: The hitchhiker’s guide from chain-of-thought reasoning to language agents,” arXiv preprint arXiv:2311.11797, 2023.
  31. H. Jiang, “A latent space theory for emergent abilities in large language models,” arXiv preprint arXiv:2304.09960, 2023.
  32. R. Tutunov, A. Grosnit, J. Ziomek, J. Wang, and H. Bou-Ammar, “Why can large language models generate correct chain-of-thoughts?” arXiv preprint arXiv:2310.13571, 2023.
  33. D. Dai, Y. Sun, L. Dong, Y. Hao, S. Ma, Z. Sui, and F. Wei, “Why can gpt learn in-context? language models secretly perform gradient descent as meta-optimizers,” in Findings of the Association for Computational Linguistics, Toronto, Canada, Jul. 2023, pp. 4005–4019.
  34. K. Zhao, Z. Zhou, X. Chen, R. Zhou, X. Zhang, S. Yu, and D. Wu, “Edgeadaptor: Online configuration adaption, model selection and resource provisioning for edge dnn inference serving at scale,” IEEE Trans. Mob. Comput., pp. 1 – 16, Jul. 2022.
  35. R. Girdhar, A. El-Nouby, Z. Liu, M. Singh, K. V. Alwala, A. Joulin, and I. Misra, “Imagebind: One embedding space to bind them all,” in Proc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Los Alamitos, CA, Jun. 2023, pp. 15 180–15 190.
  36. J. Weng, H. Chen, D. Yan, K. You, A. Duburcq, M. Zhang, Y. Su, H. Su, and J. Zhu, “Tianshou: A highly modularized deep reinforcement learning library,” J. Mach. Learn., vol. 23, no. 1, pp. 12 275–12 280, Jan. 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Minrui Xu (57 papers)
  2. Dusit Niyato (671 papers)
  3. Hongliang Zhang (108 papers)
  4. Jiawen Kang (204 papers)
  5. Zehui Xiong (177 papers)
  6. Shiwen Mao (96 papers)
  7. Zhu Han (431 papers)
Citations (9)