Cross-Modal Generative Semantic Communications for Mobile AIGC: Joint Semantic Encoding and Prompt Engineering (2404.13898v1)
Abstract: Employing massive Mobile AI-Generated Content (AIGC) Service Providers (MASPs) with powerful models, high-quality AIGC services can become accessible for resource-constrained end users. However, this advancement, referred to as mobile AIGC, also introduces a significant challenge: users should download large AIGC outputs from the MASPs, leading to substantial bandwidth consumption and potential transmission failures. In this paper, we apply cross-modal Generative Semantic Communications (G-SemCom) in mobile AIGC to overcome wireless bandwidth constraints. Specifically, we utilize a series of cross-modal attention maps to indicate the correlation between user prompts and each part of AIGC outputs. In this way, the MASP can analyze the prompt context and filter the most semantically important content efficiently. Only semantic information is transmitted, with which users can recover the entire AIGC output with high quality while saving mobile bandwidth. Since the transmitted information not only preserves the semantics but also prompts the recovery, we formulate a joint semantic encoding and prompt engineering problem to optimize the bandwidth allocation among users. Particularly, we present a human-perceptual metric named Joint Perpetual Similarity and Quality (JPSQ), which is fused by two learning-based measurements regarding semantic similarity and aesthetic quality, respectively. Furthermore, we develop the Attention-aware Deep Diffusion (ADD) algorithm, which learns attention maps and leverages the diffusion process to enhance the environment exploration ability. Extensive experiments demonstrate that our proposal can reduce the bandwidth consumption of mobile users by 49.4% on average, with almost no perceptual difference in AIGC output quality. Moreover, the ADD algorithm shows superior performance over baseline DRL methods, with 1.74x higher overall reward.
- M. Xu, H. Du, D. Niyato, J. Kang, Z. Xiong, S. Mao, Z. Han, A. Jamalipour, D. I. Kim, Xuemin, Shen, V. C. M. Leung, and H. V. Poor, “Unleashing the power of edge-cloud generative ai in mobile networks: A survey of aigc services,” ArXiv preprint: ArXiv:2304.11267, 2023.
- H. Du, R. Zhang, Y. Liu, J. Wang, Y. Lin, Z. Li, D. Niyato, J. Kang, Z. Xiong, S. Cui et al., “Beyond deep reinforcement learning: A tutorial on generative diffusion models in network optimization,” arXiv preprint arXiv:2308.05384, 2023.
- Y. Liu, H. Du, D. Niyato, J. Kang, S. Cui, X. Shen, and P. Zhang, “Optimizing mobile-edge ai-generated everything (aigx) services by prompt engineering: Fundamental, framework, and case study,” ArXiv preprint: ArXiv:2309.01065, 2023.
- The word’s first on-device stable diffusion version by qualcomm. 2023. [Online]. Available: https://www.qualcomm.com/news/onq/2023/02/worlds-first-on-device-demonstration-of-stable-diffusion-on-android
- Y.-H. Chen, R. Sarokin, J. Lee, J. Tang, C.-L. Chang, A. Kulik, and M. Grundmann, “Speed is all you need: On-device acceleration of large diffusion models via gpu-aware optimizations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 4650–4654.
- H. Du, R. Zhang, D. Niyato, J. Kang, Z. Xiong, D. I. Kim, X. S. Shen, and H. V. Poor, “Exploring collaborative distributed diffusion-based ai-generated content (aigc) in wireless networks,” IEEE Network, pp. 1–8, 2023.
- J. Wen, J. Kang, M. Xu, H. Du, Z. Xiong, Y. Zhang, and D. Niyato, “Freshness-aware incentive mechanism for mobile ai-generated content (aigc) networks,” in 2023 IEEE/CIC International Conference on Communications in China (ICCC), 2023, pp. 1–6.
- Y. Wang, M. Chen, T. Luo, W. Saad, D. Niyato, H. V. Poor, and S. Cui, “Performance optimization for semantic communications: An attention-based reinforcement learning approach,” IEEE Journal on Selected Areas in Communications, vol. 40, no. 9, pp. 2598–2613, 2022.
- R. Cheng, Y. Sun, D. Niyato, L. Zhang, L. Zhang, and M. A. Imran, “A wireless ai-generated content (aigc) provisioning framework empowered by semantic communication,” ArXiv preprint: ArXiv:2310.17705, 2023.
- Y. Lin, Z. Gao, H. Du, D. Niyato, J. Kang, A. Jamalipour, and X. S. Shen, “A unified framework for integrating semantic communication and ai-generated content in metaverse,” IEEE Wireless Communications, pp. 1–8, 2023.
- H. Talebi and P. Milanfar, “Nima: Neural image assessment,” IEEE Transactions on Image Processing, vol. 27, no. 8, pp. 3998–4011, 2018.
- Y. Li, H. Wang, Q. Jin, J. Hu, P. Chemerys, Y. Fu, Y. Wang, S. Tulyakov, and J. Ren, “Snapfusion: Text-to-image diffusion model on mobile devices within two seconds,” ArXiv preprint: ArXiv:2306.00980, 2023.
- Google on-device stable diffusion. 2023. [Online]. Available: https://developers.google.com/mediapipe
- Apple on-device stable diffusion. 2023. [Online]. Available: https://github.com/apple/ml-stable-diffusion
- E. Grassucci, S. Barbarossa, and D. Comminiello, “Generative semantic communication: Diffusion models beyond bit recovery,” ArXiv preprint: ArXiv:2306.04321, 2023.
- G. Liu, H. Du, D. Niyato, J. Kang, Z. Xiong, D. I. Kim, and X. Shen, “Semantic communications for artificial intelligence generated content (aigc) toward effective content creation,” ArXiv preprint: ArXiv:2308.04942, 2023.
- Q. He, H. Yuan, D. Feng, B. Che, Z. Chen, and X.-G. Xia, “Robust semantic transmission of images with generative adversarial networks,” in GLOBECOM 2022 - 2022 IEEE Global Communications Conference, 2022, pp. 3953–3958.
- H. Du, Z. Li, D. Niyato, J. Kang, Z. Xiong, Xuemin, Shen, and D. I. Kim, “Enabling ai-generated content (aigc) services in wireless edge networks,” ArXiv preprint: ArXiv:2301.03220, 2023.
- F. Jiang, Y. Peng, L. Dong, K. Wang, K. Yang, C. Pan, and X. You, “Large ai model-based semantic communications,” ArXiv preprint: ArXiv:2307.03492, 2023.
- P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing,” ACM Computing Survey, vol. 55, no. 9, 2023.
- A. Haviv, J. Berant, and A. Globerson, “BERTese: Learning to speak to BERT,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 2021, pp. 3618–3623.
- E. Wallace, S. Feng, N. Kandpal, M. Gardner, and S. Singh, “Universal adversarial triggers for attacking and analyzing NLP,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 2153–2162.
- T. Gao, A. Fisch, and D. Chen, “Making pre-trained language models better few-shot learners,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 2021, pp. 3816–3830.
- K. Tanwisuth, S. Zhang, H. Zheng, P. He, and M. Zhou, “Pouf: Prompt-oriented unsupervised fine-tuning for large pre-trained models,” in Forty-first International Conference on Machine Learning, 2023.
- T. Guo, S. Guo, J. Wang, X. Tang, and W. Xu, “Promptfl: Let federated participants cooperatively learn prompts instead of models - federated learning in age of foundation model,” IEEE Transactions on Mobile Computing, pp. 1–15, 2023.
- H. Zhao, W. Du, F. Li, P. Li, and G. Liu, “Fedprompt: Communication-efficient and privacy-preserving prompt tuning in federated learning,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5.
- T. Marwah, G. Mittal, and V. N. Balasubramanian, “Attentive semantic video generation using captions,” in IEEE/CVF Computer Vision and Pattern Recognition Conference, 2017, pp. 1426–1434.
- B.-K. Kim, H.-K. Song, T. Castells, and S. Choi, “On architectural compression of text-to-image diffusion models,” ArXiv preprint: ArXiv:2305.15798, 2023.
- Stable diffusion model. 2023. [Online]. Available: https://stability.ai/blog/stable-diffusion-public-release
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” in Proceedings of the 38th International Conference on Machine Learning, 2021, pp. 8748–8763.
- R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in IEEE/CVF Computer Vision and Pattern Recognition Conference, 2022, pp. 10 684–10 695.
- R. Tang, L. Liu, A. Pandey, Z. Jiang, G. Yang, K. Kumar, P. Stenetorp, J. Lin, and F. Ture, “What the DAAM: Interpreting stable diffusion using cross attention,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 2023, pp. 5644–5659.
- J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 6840–6851.
- The dependency parsing for texts. 2023. [Online]. Available: http://nlpprogress.com/english/dependency_parsing.html
- D. Deng, “Dbscan clustering algorithm based on density,” in 2020 7th International Forum on Electrical Engineering and Automation (IFEEA), 2020, pp. 949–953.
- M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996, p. 226–231.
- The image inpainting model based on diffusion. 2023. [Online]. Available: https://huggingface.co/runwayml/stable-diffusion-inpainting
- D. Cha and D. Kim, “Dam-gan: Image inpainting using dynamic attention map based on fake texture detection,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 4883–4887.
- T. Zhu, B. Peng, J. Liang, T. Han, H. Wan, J. Fu, and J. Chen, “How to evaluate semantic communications for images with vitscore metric?” ArXiv preprint: ArXiv:2309.04891, 2023.
- F. Liu, W. Tong, Y. Yang, Z. Sun, and C. Guo, “Task-oriented image semantic communication based on rate-distortion theory,” ArXiv preprint: ArXiv:2201.10929, 2022.
- D. Gündüz, Z. Qin, I. E. Aguerri, H. S. Dhillon, Z. Yang, A. Yener, K. K. Wong, and C.-B. Chae, “Beyond transmitting bits: Context, semantics, and task-oriented communications,” IEEE Journal on Selected Areas in Communications, vol. 41, no. 1, pp. 5–41, 2023.
- X. Kang, B. Song, J. Guo, Z. Qin, and F. R. Yu, “Task-oriented image transmission for scene classification in unmanned aerial systems,” IEEE Transactions on Communications, vol. 70, no. 8, pp. 5181–5192, 2022.
- The introduction to weber-fechner law. 2023. [Online]. Available: https://www.raggeduniversity.co.uk/wp-content/uploads/2018/03/Weber-Fechner-Law.pdf
- S. Fu, N. Tamir, S. Sundaram, L. Chai, R. Zhang, T. Dekel, and P. Isola, “Dreamsim: Learning new dimensions of human visual similarity using synthetic data,” in Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- M. Caron, H. Touvron, I. Misra, H. Jegou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 9630–9640.
- M. Cherti, R. Beaumont, R. Wightman, M. Wortsman, G. Ilharco, C. Gordon, C. Schuhmann, L. Schmidt, and J. Jitsev, “Reproducible scaling laws for contrastive language-image learning,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 2818–2829.
- H. Du, J. Liu, D. Niyato, J. Kang, Z. Xiong, J. Zhang, and D. I. Kim, “Attention-aware resource allocation and qoe analysis for metaverse xurllc services,” IEEE Journal on Selected Areas in Communications, vol. 41, no. 7, pp. 2158–2175, 2023.
- Y. Fan, J. Ge, S. Zhang, J. Wu, and B. Luo, “Decentralized scheduling for concurrent tasks in mobile edge computing via deep reinforcement learning,” IEEE Transactions on Mobile Computing, pp. 1–15, 2023.
- The COCO dataset. 2023. [Online]. Available: https://cocodataset.org/#home
- The weights for NIMA based on mobilenet. 2023. [Online]. Available: https://github.com/idealo/image-quality-assessment
- H. v. Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016, p. 2094–2100.
- E. Öztürk and A. Mesut, “Performance evaluation of jpeg standards, webp and png in terms of compression ratio and time for lossless encoding,” in 2021 6th International Conference on Computer Science and Engineering (UBMK), 2021, pp. 15–20.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” ArXiv preprint: ArXiv:1707.06347, 2017.
- Yinqiu Liu (28 papers)
- Hongyang Du (154 papers)
- Dusit Niyato (671 papers)
- Jiawen Kang (204 papers)
- Zehui Xiong (177 papers)
- Shiwen Mao (96 papers)
- Ping Zhang (436 papers)
- Xuemin Shen (74 papers)