Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rethinking Multi-User Semantic Communications with Deep Generative Models (2405.09866v1)

Published 16 May 2024 in eess.SP and cs.LG

Abstract: In recent years, novel communication strategies have emerged to face the challenges that the increased number of connected devices and the higher quality of transmitted information are posing. Among them, semantic communication obtained promising results especially when combined with state-of-the-art deep generative models, such as large language or diffusion models, able to regenerate content from extremely compressed semantic information. However, most of these approaches focus on single-user scenarios processing the received content at the receiver on top of conventional communication systems. In this paper, we propose to go beyond these methods by developing a novel generative semantic communication framework tailored for multi-user scenarios. This system assigns the channel to users knowing that the lost information can be filled in with a diffusion model at the receivers. Under this innovative perspective, OFDMA systems should not aim to transmit the largest part of information, but solely the bits necessary to the generative model to semantically regenerate the missing ones. The thorough experimental evaluation shows the capabilities of the novel diffusion model and the effectiveness of the proposed framework, leading towards a GenAI-based next generation of communications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. S. Barbarossa, D. Comminiello, E. Grassucci, F. Pezone, S. Sardellitti, and P. Di Lorenzo, “Semantic communications based on adaptive generative models and information bottleneck,” IEEE Comm. Magazine, vol. 61, pp. 36–41, 2023.
  2. G. Nan, Z. Li, J. Zhai, Q. Cui, G. Chen, X. Du, X. Zhang, X. Tao, Z. Han, and T. Q. S. Quek, “Physical-layer adversarial robustness for deep learning-based semantic communications,” IEEE Journal on Selected Areas in Comm., vol. 41, no. 8, pp. 2592–2608, 2023.
  3. J. Dai, P. Zhang, K. Niu, S. Wang, Z. Si, and X. Qin, “Communication beyond transmitting bits: Semantics-guided source and channel coding,” ArXiv preprint: arXiv:2208.02481, 2021.
  4. X. Luo, H.-H. Chen, and Q. Guo, “Semantic communications: Overview, open issues, and future research directions,” IEEE Wireless Comm., vol. 29, no. 1, pp. 210–219, 2022.
  5. J. Choi, J. Park, E. Grassucci, and D. Comminiello, “Semantic communication challenges: Understanding dos and avoiding don’ts,” IEEE Veichular and Tech. Conf. (VTC) Spring, 2024.
  6. E. Grassucci, Y. Mitsufuji, P. Zhang, and D. Comminiello, “Enhancing semantic communication with deep generative models - an icassp special session overview,” in IEEE Int. Conf. on Audio, Speech, and Signal Process. (ICASSP), 2024.
  7. E. Grassucci, J. Park, S. Barbarossa, S.-L. Kim, J. Choi, and D. Comminiello, “Generative AI meets semantic communication: Evolution and revolution of communication tasks,” ArXiv preprint: arXiv:2401.06803, 2024.
  8. E. Grassucci, S. Barbarossa, and D. Comminiello, “Generative semantic communication: Diffusion models beyond bit recovery,” ArXiv preprint: arXiv:2306.04321, 2023.
  9. T. Han, J. Tang, Q. Yang, Y. Duan, Z. Zhang, and Z. Shi, “Generative model based highly efficient semantic communication approach for image transmission,” IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2023.
  10. J. Kang, H. Du, Z. Li, Z. Xiong, S. Ma, D. Niyato, and Y. Li, “Personalized saliency in task-oriented semantic communications: Image transmission and performance analysis,” IEEE Journal on Selected Areas in Comm., vol. 41, no. 1, pp. 186–201, 2023.
  11. E. Grassucci, C. Marinoni, A. Rodriguez, and D. Comminiello, “Diffusion models for audio semantic communication,” in IEEE Int. Conf. on Audio, Speech, and Signal Process. (ICASSP), 2024.
  12. T. Han, Q. Yang, Z. Shi, S. He, and Z. Zhang, “Semantic-preserved communication system for highly efficient speech transmission,” IEEE Journal on Selected Areas in Comm., vol. 41, pp. 245–259, 2022.
  13. Z. Weng, Z. Qin, and G. Y. Li, “Semantic communications for speech signals,” in IEEE Int. Conf. on Comm. (ICC), 2021.
  14. I. Jang, H. Yang, W. Lim, S. Beack, , and M. Kim, “Personalized neural speech codec,” in IEEE Int. Conf. on Acoustics, Speech, and Signal Process. (ICASSP), 2024.
  15. H. Nam, J. Park, J. Choi, M. Bennis, and S.-L. Kim, “Language-oriented communication with semantic coding and knowledge distillation for text-to-image generation,” in IEEE Int. Conf. on Audio, Speech, and Signal Process. (ICASSP), 2024.
  16. H. Nam, J. Park, J. Choi, and S.-L. Kim, “Sequential semantic generative communication for progressive text-to-image generation,” in 20th Annual IEEE Int. Conf. on Sensing, Comm., and Netw. (SECON), pp. 91–94, 2023.
  17. P. Tandon, S. Chandak, P. Pataranutaporn, Y. Liu, A. M. Mapuranga, P. Maes, T. Weissman, and M. Sra, “Txt2Vid: Ultra-low bitrate compression of talking-head videos via text,” IEEE Journal on Selected Areas in Comm., vol. 41, pp. 107–118, 2021.
  18. M. Nemati, J. Park, and J. Choi, “VQ-VAE empowered wireless communication for joint source-channel coding and beyond,” in IEEE Global Communications Conference (GLOBECOM), pp. 1–6, 2023.
  19. P. Jiang, C.-K. Wen, X. Yi, X. Li, S. Jin, and J. Zhang, “Semantic communications using foundation models: Design approaches and open issues,” ArXiv preprint: arXiv:2309.13315, 2023.
  20. H. Xie, Z. Qin, X. Tao, and K. B. Letaief, “Task-oriented multi-user semantic communications,” IEEE Journal on Selected Areas in Communications, vol. 40, no. 9, pp. 2584–2597, 2022.
  21. X. N. Loc, T. L. Ye, T. Y. Kyaw, N. N. H. Minh, C. Zhang, H. Zhu, and C. S. Hong, “Swin transformer-based dynamic semantic communication for multi-user with different computing capacity,” IEEE Trans. on Vehicular Technology, pp. 1–16, 2024.
  22. W. Li, H. Liang, C. Dong, X. Xu, P. Zhang, and K. Liu, “Non-orthogonal multiple access enhanced multi-user semantic communication,” IEEE Trans. on Cognitive Comm. and Netw., vol. 9, no. 6, pp. 1438–1453, 2023.
  23. W. Xu, F. Gao, Y. Zhang, C. Pan, and G. Liu, “Multi-user matching and resource allocation in vision aided communications,” IEEE Trans. on Comm., vol. 71, no. 8, pp. 4528–4543, 2023.
  24. E. Moliner, J. Lehtinen, and V. Välimäki, “Solving audio inverse problems with a diffusion model,” IEEE Int. Conf. on Acoustics, Speech and Signal Process. (ICASSP), pp. 1–5, 2022.
  25. N. Murata, K. Saito, C.-H. Lai, Y. Takida, T. Uesaka, Y. Mitsufuji, and S. Ermon, “GibbsDDRM: A partially collapsed gibbs sampler for solving blind inverse problems with denoising diffusion restoration,” in Int. Conf. on Machine Learning (ICML), 2023.
  26. C. Meng, Y. He, Y. Song, J. Song, J. Wu, J.-Y. Zhu, and S. Ermon, “SDEdit: Guided image synthesis and editing with stochastic differential equations,” in Int. Conf. on Learning Repr., 2021.
  27. J. Schwab, S. Antholzer, and M. Haltmeier, “Deep null space learning for inverse problems: convergence analysis and rates,” Inverse Problems, vol. 35, jan 2019.
  28. Y. Wang, Y. F. Hu, J. Yu, and J. Zhang, “GAN prior based null-space learning for consistent super-resolution,” in AAAI Conf. on Artificial Intell., 2023.
  29. J. Wang, S. Wang, Q. Zhang, R, Z. Zheng, W. Liu, and X. Wang, “A range-null space decomposition approach for fast and flexible spectral compressive imaging,” ArXiv preprint: arXiv:2305.09746, 2023.
  30. Y. Wang, J. Yu, and J. Zhang, “Zero-shot image restoration using denoising diffusion null-space model,” Int. Conf. on Learning Repr. (ICLR), 2023.
  31. E. Calvanese Strinati and S. Barbarossa, “6G networks: Beyond Shannon towards semantic and goal-oriented communications,” Computer Networks, vol. 190, p. 107930, 2020.
  32. J. Huang, D. Li, C. H. Xiu, X. Qin, and W. Zhang, “Joint task and data oriented semantic communications: A deep separate source-channel coding scheme,” ArXiv preprint: arXiv:2302.13580, 2023.
  33. Z. Qin, X. Tao, J. Lu, and G. Y. Li, “Semantic communications: Principles and challenges,” ArXiv preprint: arXiv:2201.01389, 2021.
  34. E. Calvanese Strinati and et al., “Goal-oriented and semantic communication in 6G AI-native networks: The 6G-GOALS approach,” ArXiv preprint: arXiv:2402.07573, 2024.
  35. N. Patwa, N. A. Ahuja, S. Somayazulu, O. Tickoo, S. Varadarajan, and S. G. Koolagudi, “Semantic-preserving image compression,” IEEE International Conference on Image Processing (ICIP), pp. 1281–1285, 2020.
  36. C. Wang, Y. Han, and W. Wang, “An end-to-end deep learning image compression framework based on semantic analysis,” Applied Sciences, 2019.
  37. Z. Xiao, S. Yao, J. Dai, S. Wang, K. Niu, and P. Zhang, “Wireless deep speech semantic transmission,” in IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2023.
  38. P. Jiang, C.-K. Wen, S. Jin, and G. Y. Li, “Wireless semantic communications for video conferencing,” IEEE Journal on Selected Areas in Communications, vol. 41, pp. 230–244, 2022.
  39. N. M. AL-Shakarji, F. Bunyak, H. Aliakbarpour, G. Seetharaman, and K. Palaniappan, “Performance evaluation of semantic video compression using multi-cue object detection,” in IEEE Applied Imagery Pattern Recognition Workshop (AIPR), pp. 1–8, 2019.
  40. J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Advances in Neural Information Processing Systems (NeurIPS), pp. 6840–6851, 2020.
  41. C. Saharia, C. W., S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour, R. Gontijo-Lopes, B. K. Ayan, T. Salimans, J. Ho, D. J. Fleet, and M. Norouzi, “Photorealistic text-to-image diffusion models with deep language understanding,” in Advances in Neural Information Processing Systems (NeurIPS), 2022.
  42. R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), p. 10674–10685, 2021.
  43. H. You, M. Guo, Z. Wang, K.-W. Chang, J. Baldridge, and J. Yu, “CoBIT: A contrastive bi-directional image-text generation model,” ArXiv preprint: ArXiv:2303.13455, 2023.
  44. H. Liu, Q. Tian, Y. Yuan, X. Liu, X. Mei, Q. Kong, Y. Wang, W. Wang, Y. Wang, and M. D. Plumbley, “AudioLDM 2: learning holistic audio generation with self-supervised pretraining,” ArXiv preprint: arXiv:2308.05734, 2023.
  45. D. Ghosal, N. Majumder, A. Mehrish, and S. Poria, “Text-to-audio generation using instruction-tuned LLM and latent diffusion model,” ArXiv preprint: arXiv:2304.13731, 2023.
  46. V. Popov, A. Amatov, M. Kudinov, V. Gogoryan, T. Sadekova, and I. Vovk, “Optimal transport in diffusion modeling for conversion tasks in audio domain,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5, 2023.
  47. R. Huang, J.-B. Huang, D. Yang, Y. Ren, L. Liu, M. Li, Z. Ye, J. Liu, X. Yin, and Z. Zhao, “Make-An-Audio: Text-to-audio generation with prompt-enhanced diffusion models,” ArXiv preprint: arXiv:2301.12661, 2023.
  48. A. Turetzky, T. Michelson, Y. Adi, and S. Peleg, “Deep Audio Waveform Prior,” in Interspeech, pp. 2938–2942, 2022.
  49. W. Hong, M. Ding, W. Zheng, X. Liu, and J. Tang, “CogVideo: Large-scale pretraining for text-to-video generation via transformers,” in International Conference on Learning Representations (ICLR), 2023.
  50. X. Gu, C. Wen, J. Song, and Y. Gao, “Seer: Language instructed video prediction with latent diffusion models,” ArXiv preprint: ArXiv:2303.14897, 2023.
  51. Y. Jiang, S. Yang, T. K. Liang, W. Wu, C. L. Change, and Z. Liu, “Text2Performer: Text-driven human video generation,” ArXiv preprint: ArXiv:2304.08483, 2023.
  52. A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models,” in International Conference on Machine Learning (ICML), 2021.
  53. L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” IEEE/CVF Int. Conf. on Computer Vision (ICCV), pp. 3813–3824, 2023.
  54. P. Dhariwal and A. Q. Nichol, “Diffusion models beat GANs on image synthesis,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 34, 2021.
  55. F.-A. Croitoru, V. Hondru, R. T. Ionescu, and M. Shah, “Diffusion models in vision: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–20, 2023.
  56. M. Yang, D. Gao, F. Xie, J. Li, X. Song, and G. Shi, “SG2SC: a generative semantic communication framework for scene understanding-oriented image transmission,” in IEEE Int. Conf. on Acoustics, Speech and Signal Process. (ICASSP), pp. 13486–13490, 2024.
  57. A. Wijesinghe, S. Zhang, S. Wanninayaka, W. Wang, and Z. Ding, “Diff-GO: Diffusion goal-oriented communications to achieve ultra-high spectrum efficiency,” vol. ArXiv preprint: arXiv:2312.02984, 2023.
  58. Y. Zeng, X. He, X. Chen, H. Tong, Z. Yang, Y. Guo, and J. Hao, “DMCE: Diffusion model channel enhancer for multi-user semantic communication systems,” 2024.
  59. E. Erdemir, T.-Y. Tung, P. L. Dragotti, and D. Gunduz, “Generative joint source-channel coding for semantic image transmission,” ArXiv preprint: arXiv:2211.13772, 2022.
  60. Y. Malur Saidutta, A. Abdi, and F. Fekri, “VAE for joint source-channel coding of distributed gaussian sources over AWGN MAC,” in IEEE Int. Workshop on Signal Processing Advances in Wireless Comm. (SPAWC), pp. 1–5, 2020.
  61. A. H. Estiri, M. R. Sabramooz, A. Banaei, A. H. Dehghan, B. Jamialahmadi, and M. J. Siavoshani, “A variational auto-encoder approach for image transmission in wireless channel,” arXiv preprint: arXiv:2010.03967, 2020.
  62. A. Goldsmith, Wireless Communications. Cambridge University Press, 2005.
  63. D. Tse and P. Viswanath, Fundamentals of Wireless Communication. Cambridge University Press, 2005.
  64. M. Yang, C. Bian, and H.-S. Kim, “Deep joint source channel coding for wirelessimage transmission with ofdm,” in Int. Conf. on Comm. (ICC), 2021.
  65. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2018.
  66. Q. Mao, T. Yang, Y. Zhang, Z. Wang, M. Wang, S. Wang, and S. Ma, “Extreme image compression using fine-tuned VQGANs,” 2023.
  67. C. Meng, Y. He, Y. Song, J. Song, J. Wu, J.-Y. Zhu, and S. Ermon, “SDEdit: Guided image synthesis and editing with stochastic differential equations,” in Int. Conf. on Learning Repr. (ICLR), 2022.
  68. A. Brock, J. Donahue, and K. Simonyan, “Large scale GAN training for high fidelity natural image synthesis,” 2019.
  69. Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in Int. Conf. on Learning Repr. (ICLR), 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com