Diffusion-based Generative Multicasting with Intent-aware Semantic Decomposition (2411.02334v1)
Abstract: Generative diffusion models (GDMs) have recently shown great success in synthesizing multimedia signals with high perceptual quality enabling highly efficient semantic communications in future wireless networks. In this paper, we develop an intent-aware generative semantic multicasting framework utilizing pre-trained diffusion models. In the proposed framework, the transmitter decomposes the source signal to multiple semantic classes based on the multi-user intent, i.e. each user is assumed to be interested in details of only a subset of the semantic classes. The transmitter then sends to each user only its intended classes, and multicasts a highly compressed semantic map to all users over shared wireless resources that allows them to locally synthesize the other classes, i.e. non-intended classes, utilizing pre-trained diffusion models. The signal retrieved at each user is thereby partially reconstructed and partially synthesized utilizing the received semantic map. This improves utilization of the wireless resources, with better preserving privacy of the non-intended classes. We design a communication/computation-aware scheme for per-class adaptation of the communication parameters, such as the transmission power and compression rate to minimize the total latency of retrieving signals at multiple receivers, tailored to the prevailing channel conditions as well as the users reconstruction/synthesis distortion/perception requirements. The simulation results demonstrate significantly reduced per-user latency compared with non-generative and intent-unaware multicasting benchmarks while maintaining high perceptual quality of the signals retrieved at the users.
- Z. Meng et al., “Task-oriented metaverse design in the 6G era,” IEEE Wireless Communications, vol. 31, no. 3, pp. 212–218, 2024.
- O. Hashash et al., “The seven worlds and experiences of the wireless metaverse: Challenges and opportunities,” IEEE Communications Magazine, pp. 1–8, 2024.
- Z. Wang et al., “Goal-oriented semantic communication for the metaverse application,” arXiv preprint arXiv:2408.03646v1, 2024.
- W. Yang et al., “Streamlined transmission: A semantic-aware XR deployment framework enhanced by generative AI,” IEEE Network, pp. 1–1, 2024.
- C. Wang et al., “Adaptive semantic-bit communication for extended reality interactions,” IEEE Journal of Selected Topics in Signal Processing, vol. 17, no. 5, pp. 1080–1092, 2023.
- N. Sehad et al., “Generative AI for immersive communication: The next frontier in internet-of-senses through 6G,” arXiv:2404.01713v2, 2024.
- B. Kizilkaya et al., “Task-oriented prediction and communication co-design for haptic communications,” IEEE Transactions on Vehicular Technology, vol. 72, no. 7, pp. 8987–9001, 2023.
- D. Gündüz et al., “Beyond transmitting bits: Context, semantics, and task-oriented communications,” IEEE J. Select. Areas Commun., vol. 41, no. 1, pp. 5–41, 2023.
- W. Yang et al., “Semantic communications for future internet: Fundamentals, applications, and challenges,” IEEE Commun. Surv. Tutor., vol. 25, no. 1, pp. 213–250, 2023.
- H. Xie et al., “Deep learning enabled semantic communication systems,” IEEE Trans. Signal Processing, vol. 69, pp. 2663–2675, 2021.
- Y. Shao et al., “A theory of semantic communication,” IEEE Transactions on Mobile Computing, pp. 1–18, 2024.
- H. Seo et al., “Semantics-native communication via contextual reasoning,” IEEE Transactions on Cognitive Communications and Networking, vol. 9, no. 3, pp. 604–617, 2023.
- L. Xia et al., “Generative AI for semantic communication: Architecture, challenges, and outlook,” arXiv preprint arXiv:2308.15483v2, 2024.
- E. Grassucci et al., “Generative AI meets semantic communication: Evolution and revolution of communication tasks,” arXiv preprint arXiv:2401.06803v1, 2024.
- L. Qiao et al., “Latency-aware generative semantic communications with pre-trained diffusion models,” IEEE Wireless Communications Letters, pp. 1–1, 2024.
- E. Erdemir et al., “Generative joint source-channel coding for semantic image transmission,” IEEE Journal on Selected Areas in Communications, vol. 41, no. 8, pp. 2645–2657, 2023.
- T. Brooks et al., “Video generation models as world simulators,” 2024. [Online]. Available: https://openai.com/research/video-generation-models-as-world-simulators
- O. Bar-Tal et al., “Lumiere: A space-time diffusion model for video generation,” arXiv preprint arXiv:2401.12945, 2024.
- A. Ramesh et al., “Zero-shot text-to-image generation,” in Proc. Int. Conf. Mach. Learn. (ICML), vol. 139, 18–24 Jul 2021, pp. 8821–8831.
- P. Jiang et al., “Semantic communications using foundation models: Design approaches and open issues,” IEEE Wireless Communications, vol. 31, no. 3, pp. 76–84, 2024.
- C. Xu et al., “Semantic-aware power allocation for generative semantic communications with foundation models,” arXiv preprint arXiv:2407.03050, 2024.
- H. Xie et al., “Toward intelligent communications: Large model empowered semantic communications,” IEEE Communications Magazine, pp. 1–7, 2024.
- W. Yang et al., “Rethinking generative semantic communication for multi-user systems with multi-modal LLM,” arXiv:2408.08765v1, 2024.
- Y. Blau and T. Michaeli, “The perception-distortion tradeoff,” IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
- ——, “Rethinking lossy compression: The rate-distortion-perception tradeoff,” Proc. Int. Conf. Mach. Learn. (ICML), pp. 675–685, 2019.
- J. Chen et al., “On the rate-distortion-perception function,” IEEE J. Sel. Areas Inf. Theory, vol. 3, no. 4, pp. 664–673, 2022.
- Y. Hamdi et al., “The rate-distortion-perception trade-off: The role of private randomness,” arXiv preprint arXiv:2404.01111v1, 2024.
- M. Ren et al., “Generative semantic communication via textual prompts: Latency performance tradeoffs,” arXiv preprint arXiv:2409.09715v1, 2024.
- Y. Liu et al., “Cross-modal generative semantic communications for mobile AIGC: Joint semantic encoding and prompt engineering,” IEEE Transactions on Mobile Computing, pp. 1–16, 2024.
- G. Cicchetti et al., “Language-oriented semantic latent representation for image transmission,” arXiv preprint arXiv:2405.09976v1, 2024.
- S. F. Yilmaz et al., “High perceptual quality wireless image delivery with denoising diffusion models,” in IEEE INFOCOM 2024-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 2024, pp. 1–5.
- G. Pignata et al., “Lightweight diffusion models for resource-constrained semantic communication,” arXiv preprint arXiv:2410.02491v1, 2024.
- J. Ho et al., “Denoising diffusion probabilistic models,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 33, 2020, pp. 6840–6851.
- J. Song et al., “Denoising diffusion implicit models,” in Proc. Int. Conf. Learn. Representations (ICLR), 2021.
- P. Dhariwal and A. Nichol, “Diffusion models beat GANs on image synthesis,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 34, 2021, pp. 8780–8794.
- R. Rombach et al., “High-resolution image synthesis with latent diffusion models,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 10 674–10 685.
- L. Zhang et al., “Adding conditional control to text-to-image diffusion models,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2023, pp. 3836–3847.
- R. Huang et al., “Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models,” arXiv preprint arXiv:2301.12661, 2023.
- A. Blattmann et al., “Align your latents: High-resolution video synthesis with latent diffusion models,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 22 563–22 575.
- X. Shi et al., “Motion-I2V: Consistent and controllable image-to-video generation with explicit motion modeling,” in ACM SIGGRAPH 2024 Conference Papers, ser. SIGGRAPH ’24. New York, NY, USA: Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10.1145/3641519.3657497
- M. Liu et al., “One-2-3-45++: Fast single image to 3d objects with consistent multi-view generation and 3d diffusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024, pp. 10 072–10 083.
- H. Chen et al., “Generic 3d diffusion adapter using controlled multi-view editing,” 2024.
- F.-A. Croitoru et al., “Diffusion models in vision: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 10 850–10 869, 2023.
- W. Wang et al., “Semantic image synthesis via diffusion models,” arXiv:2207.00050v2, 2022.
- T.-C. Wang et al., “High-resolution image synthesis and semantic manipulation with conditional gans,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 8798–8807.
- T. Park et al., “Semantic image synthesis with spatially-adaptive normalization,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2332–2341.
- Z. Tan et al., “Efficient semantic image synthesis via class-adaptive normalization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 4852–4866, 2022.
- K. M. Seyam et al., “SVS-GAN: leveraging GANs for semantic video synthesis,” arXiv:2409.06074v1 [cs.CV], 9 Sep 2024.
- J. Wang et al., “RTFormer: Efficient design for real-time semantic segmentation with transformer,” in Advances in Neural Information Processing Systems, A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, Eds., 2022. [Online]. Available: https://openreview.net/forum?id=kMiL9hWbD1z
- H. Pan et al., “Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 3, pp. 3448–3460, 2023.
- A. Kirillov et al., “Segment anything,” in 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 3992–4003.
- G. Wallace, “The JPEG still picture compression standard,” IEEE Transactions on Consumer Electronics, vol. 38, no. 1, pp. xviii–xxxiv, 1992.
- F. Bellard, “Better portable graphics,” https://bellard.org/bpg/, 2014.
- J. Ballé et al., “Nonlinear transform coding,” IEEE J. Sel. Topics Signal Process., vol. 15, no. 2, pp. 339–353, 2020.
- Z. Wang et al., “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems &\&& Computers, 2003, vol. 2, 2003, pp. 1398–1402 Vol.2.
- R. Zhang et al., “The unreasonable effectiveness of deep features as a perceptual metric,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 586–595.
- M. Heusel et al., “GANs trained by a two time-scale update rule converge to a local Nash equilibrium,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 30, 2017.
- M. Cordts et al., “The cityscapes dataset for semantic urban scene understanding,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Y. Zhao et al., “Mobilediffusion: Instant text-to-image generation on mobile devices,” arXiv preprint arXiv:2311.16567v2, 2024.