Large Generative Model Assisted 3D Semantic Communication (2403.05783v1)
Abstract: Semantic Communication (SC) is a novel paradigm for data transmission in 6G. However, there are several challenges posed when performing SC in 3D scenarios: 1) 3D semantic extraction; 2) Latent semantic redundancy; and 3) Uncertain channel estimation. To address these issues, we propose a Generative AI Model assisted 3D SC (GAM-3DSC) system. Firstly, we introduce a 3D Semantic Extractor (3DSE), which employs generative AI models, including Segment Anything Model (SAM) and Neural Radiance Field (NeRF), to extract key semantics from a 3D scenario based on user requirements. The extracted 3D semantics are represented as multi-perspective images of the goal-oriented 3D object. Then, we present an Adaptive Semantic Compression Model (ASCM) for encoding these multi-perspective images, in which we use a semantic encoder with two output heads to perform semantic encoding and mask redundant semantics in the latent semantic space, respectively. Next, we design a conditional Generative adversarial network and Diffusion model aided-Channel Estimation (GDCE) to estimate and refine the Channel State Information (CSI) of physical channels. Finally, simulation results demonstrate the advantages of the proposed GAM-3DSC system in effectively transmitting the goal-oriented 3D scenario.
- M. A. Uusitalo, P. Rugeland, M. R. Boldi, E. C. Strinati, P. Demestichas, M. Ericson, G. P. Fettweis, M. C. Filippou, A. Gati, M.-H. Hamon et al., “6G vision, value, use cases and technologies from european 6G flagship project hexa-x,” IEEE Access, vol. 9, pp. 160 004–160 020, 2021.
- W. Yang, H. Du, Z. Q. Liew, W. Y. B. Lim, Z. Xiong, D. Niyato, X. Chi, X. S. Shen, and C. Miao, “Semantic communications for future internet: Fundamentals, applications, and challenges,” IEEE Communications Surveys & Tutorials, 2022.
- Z. Qin, X. Tao, J. Lu, and G. Y. Li, “Semantic communications: Principles and challenges,” arXiv preprint arXiv:2201.01389, 2021.
- Y. Huang, Y. Zhu, X. Qiao, X. Su, S. Dustdar, and P. Zhang, “Towards holographic video communications: A promising ai-driven solution,” IEEE Communications Magazine, 2022.
- S. Iyer, R. Khanai, D. Torse, R. J. Pandya, K. M. Rabie, K. Pai, W. U. Khan, and Z. Fadlullah, “A survey on semantic communications for intelligent wireless networks,” Wireless Personal Communications, pp. 1–43, 2022.
- J. Wang, H. Du, Z. Tian, D. Niyato, J. Kang et al., “Semantic-aware sensing information transmission for metaverse: A contest theoretic approach,” arXiv preprint arXiv:2211.12783, 2022.
- C. Xiao, Y. R. Zheng, and N. C. Beaulieu, “Novel sum-of-sinusoids simulation models for rayleigh and rician fading channels,” IEEE Transactions on Wireless Communications, vol. 5, no. 12, pp. 3667–3679, 2006.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
- L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, Y. Shao, W. Zhang, B. Cui, and M.-H. Yang, “Diffusion models: A comprehensive survey of methods and applications,” arXiv preprint arXiv:2209.00796, 2022.
- B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021.
- A. Taloni, V. Scorcia, and G. Giannaccare, “Modern threats in academia: evaluating plagiarism and artificial intelligence detection scores of chatgpt,” Eye, pp. 1–4, 2023.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment anything,” arXiv preprint arXiv:2304.02643, 2023.
- F. Jiang, Y. Peng, L. Dong, K. Wang, K. Yang, C. Pan, and X. You, “Large AI model-based semantic communications,” arXiv preprint arXiv:2307.03492, 2023.
- F. Jiang, L. Dong, Y. Peng, K. Wang, K. Yang, C. Pan, D. Niyato, and O. A. Dobre, “Large language model enhanced multi-agent systems for 6G communications,” arXiv preprint arXiv:2312.07850, 2023.
- Y.-C. Guo, D. Kang, L. Bao, Y. He, and S.-H. Zhang, “Nerfren: Neural radiance fields with reflections,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022, pp. 18 388–18 397.
- D. Maggio, M. Abate, J. Shi, C. Mario, and L. Carlone, “Loc-NeRF: Monte carlo localization using neural radiance fields,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 4018–4025.
- J. Ma and B. Wang, “Segment anything in medical images,” arXiv preprint arXiv:2304.12306, 2023.
- T. Yu, R. Feng, R. Feng, J. Liu, X. Jin, W. Zeng, and Z. Chen, “Inpaint anything: Segment anything meets image inpainting,” arXiv preprint arXiv:2304.06790, 2023.
- L. Tang, H. Xiao, and B. Li, “Can SAM segment anything? when SAM meets camouflaged object detection,” arXiv preprint arXiv:2304.04709, 2023.
- H. Zhang, V. Sindagi, and V. M. Patel, “Image de-raining using a conditional generative adversarial network,” IEEE transactions on circuits and systems for video technology, vol. 30, no. 11, pp. 3943–3956, 2019.
- Q. Zhang, A. Ferdowsi, W. Saad, and M. Bennis, “Distributed conditional generative adversarial networks (GANs) for data-driven millimeter wave communications in uav networks,” IEEE Transactions on Wireless Communications, vol. 21, no. 3, pp. 1438–1452, 2021.
- B. Banerjee, R. C. Elliott, W. A. Krzymień, and H. Farmanbar, “Downlink channel estimation for FDD massive MIMO using conditional generative adversarial networks,” IEEE Transactions on Wireless Communications, vol. 22, no. 1, pp. 122–137, 2023.
- H. Tang, Y. Zhao, G. Wang, C. Luo, and W. Wang, “Wireless signal denoising using conditional generative adversarial networks,” in IEEE INFOCOM 2023 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), 2023, pp. 1–6.
- C. Saharia, W. Chan, H. Chang, C. Lee, J. Ho, T. Salimans, D. Fleet, and M. Norouzi, “Palette: Image-to-image diffusion models,” in ACM SIGGRAPH 2022 Conference Proceedings, 2022, pp. 1–10.
- J. Ho, T. Salimans, A. Gritsenko, W. Chan, M. Norouzi, and D. J. Fleet, “Video diffusion models,” arXiv preprint arXiv:2204.03458, 2022.
- M. Xu, L. Yu, Y. Song, C. Shi, S. Ermon, and J. Tang, “Geodiff: A geometric diffusion model for molecular conformation generation,” arXiv preprint arXiv:2203.02923, 2022.
- S. Park, O. Simeone, and J. Kang, “End-to-end fast training of communication links without a channel model via online meta-learning,” in 2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC). IEEE, 2020, pp. 1–5.
- Z. Zhang, “Improved adam optimizer for deep neural networks,” in 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS). Ieee, 2018, pp. 1–2.
- F. Jiang, Y. Peng, L. Dong, K. Wang, K. Yang, C. Pan, and X. You, “Large AI model empowered multimodal semantic communications,” arXiv preprint arXiv:2309.01249, 2023.
- J. Cen, Z. Zhou, J. Fang, W. Shen, L. Xie, X. Zhang, and Q. Tian, “Segment anything in 3d with NeRFs,” arXiv preprint arXiv:2304.12308, 2023.
- S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in vision: A survey,” ACM computing surveys (CSUR), vol. 54, no. 10s, pp. 1–41, 2022.
- Z. Zhou, S. Zheng, J. Chen, Z. Zhao, and X. Yang, “Speech semantic communication based on swin transformer,” IEEE Transactions on Cognitive Communications and Networking, 2023.
- H. Xie and Z. Qin, “A lite distributed semantic communication system for internet of things,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 1, pp. 142–153, 2020.
- Y. Dong, H. Wang, and Y.-D. Yao, “Channel estimation for one-bit multiuser massive MIMO using conditional GAN,” IEEE Communications Letters, vol. 25, no. 3, pp. 854–858, 2020.
- X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley, “Least squares generative adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2794–2802.
- H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1520–1528.
- J. Li, D. Li, C. Xiong, and S. Hoi, “BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in International Conference on Machine Learning. PMLR, 2022, pp. 12 888–12 900.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- K. Mrinalini, P. Vijayalakshmi, and T. Nagarajan, “Sbsim: A sentence-bert similarity-based evaluation metric for indian language neural machine translation systems,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1396–1406, 2022.
- Feibo Jiang (24 papers)
- Yubo Peng (15 papers)
- Li Dong (154 papers)
- Kezhi Wang (106 papers)
- Kun Yang (227 papers)
- Cunhua Pan (210 papers)
- Xiaohu You (177 papers)