Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Latency-Aware Generative Semantic Communications with Pre-Trained Diffusion Models (2403.17256v2)

Published 25 Mar 2024 in cs.IT, eess.SP, math.IT, cs.CV, and cs.MM

Abstract: Generative foundation AI models have recently shown great success in synthesizing natural signals with high perceptual quality using only textual prompts and conditioning signals to guide the generation process. This enables semantic communications at extremely low data rates in future wireless networks. In this paper, we develop a latency-aware semantic communications framework with pre-trained generative models. The transmitter performs multi-modal semantic decomposition on the input signal and transmits each semantic stream with the appropriate coding and communication schemes based on the intent. For the prompt, we adopt a re-transmission-based scheme to ensure reliable transmission, and for the other semantic modalities we use an adaptive modulation/coding scheme to achieve robustness to the changing wireless channel. Furthermore, we design a semantic and latency-aware scheme to allocate transmission power to different semantic modalities based on their importance subjected to semantic quality constraints. At the receiver, a pre-trained generative model synthesizes a high fidelity signal using the received multi-stream semantics. Simulation results demonstrate ultra-low-rate, low-latency, and channel-adaptive semantic communications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. L. Xia et al., “Generative AI for semantic communication: Architecture, challenges, and outlook,” arXiv preprint arXiv:2308.15483v2, 2024.
  2. G. Grassucci et al., “Generative AI meets semantic communication: Evolution and revolution of communication tasks,” arXiv preprint arXiv:2401.06803v1, 2024.
  3. B. Li et al., “Extreme video compression with pre-trained diffusion models,” arXiv preprint arXiv:2402.08934, 2024.
  4. R. Rombach et al., “High-resolution image synthesis with latent diffusion models,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 10 674–10 685.
  5. P. Dhariwal et al., “Diffusion models beat GANs on image synthesis,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 34, 2021, pp. 8780–8794.
  6. D. P. Kingma et al., “Glow: Generative flow with invertible 1×1 convolutions,” Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2018.
  7. I. Goodfellow et al., “Generative adversarial nets,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 27, 2014.
  8. J. Chen et al., “On the rate-distortion-perception function,” IEEE J. Sel. Areas Inf. Theory, vol. 3, no. 4, pp. 664–673, 2022.
  9. Y. Blau et al., “Rethinking lossy compression: The rate-distortion-perception tradeoff,” Proc. Int. Conf. Mach. Learn. (ICML), pp. 675–685, 2019.
  10. T. Brooks et al., “Video generation models as world simulators,” 2024. [Online]. Available: https://openai.com/research/video-generation-models-as-world-simulators
  11. O. Bar-Tal et al., “Lumiere: A space-time diffusion model for video generation,” arXiv preprint arXiv:2401.12945, 2024.
  12. A. Ramesh et al., “Zero-shot text-to-image generation,” in Proc. Int. Conf. Mach. Learn. (ICML), vol. 139, 18–24 Jul 2021, pp. 8821–8831.
  13. D. Gündüz et al., “Beyond transmitting bits: Context, semantics, and task-oriented communications,” IEEE J. Select. Areas Commun., vol. 41, no. 1, pp. 5–41, 2023.
  14. W. Yang et al., “Semantic communications for future internet: Fundamentals, applications, and challenges,” IEEE Commun. Surv. Tutor., vol. 25, no. 1, pp. 213–250, 2023.
  15. H. Xie et al., “Deep learning enabled semantic communication systems,” IEEE Trans. Signal Processing, vol. 69, pp. 2663–2675, 2021.
  16. J. Wu, G. Wang, and Y. R. Zheng, “Energy efficiency and spectral efficiency tradeoff in type-I ARQ systems,” IEEE J. Select. Areas Commun., vol. 32, no. 2, pp. 356–366, 2014.
  17. A. J. Goldsmith et al., “Variable-rate variable-power MQAM for fading channels,” IEEE Trans. Commun., vol. 45, no. 10, pp. 1218–1230, 1997.
  18. Z. Wang et al., “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems &\&& Computers, 2003, vol. 2, 2003, pp. 1398–1402 Vol.2.
  19. R. Zhang et al., “The unreasonable effectiveness of deep features as a perceptual metric,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 586–595.
  20. M. Heusel et al., “GANs trained by a two time-scale update rule converge to a local Nash equilibrium,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 30, 2017.
  21. A. Radford et al., “Learning transferable visual models from natural language supervision,” in Proc. Int. Conf. Mach. Learn. (ICML).   PMLR, 2021.
  22. J. Achiam et al., “GPT-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
  23. J. Li et al., “BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in Proc. Int. Conf. Mach. Learn. (ICML).   PMLR, 2022, pp. 12 888–12 900.
  24. X. Li et al., “Oscar: Object-semantics aligned pre-training for vision-language tasks,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2020, pp. 121–137.
  25. W. Li et al., “UNIMO: Towards unified-modal understanding and generation via cross-modal contrastive learning,” arXiv preprint arXiv:2012.15409, 2020.
  26. S. Xie and Z. Tu, “Holistically-nested edge detection,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2015, pp. 1395–1403.
  27. F. Yergeau, “UTF-8, a transformation format of ISO 10646,” Tech. Rep., 2003.
  28. J. Ballé et al., “Nonlinear transform coding,” IEEE J. Sel. Topics Signal Process., vol. 15, no. 2, pp. 339–353, 2020.
Citations (13)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com