Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Language-Oriented Semantic Latent Representation for Image Transmission (2405.09976v1)

Published 16 May 2024 in cs.CV and eess.SP

Abstract: In the new paradigm of semantic communication (SC), the focus is on delivering meanings behind bits by extracting semantic information from raw data. Recent advances in data-to-text models facilitate language-oriented SC, particularly for text-transformed image communication via image-to-text (I2T) encoding and text-to-image (T2I) decoding. However, although semantically aligned, the text is too coarse to precisely capture sophisticated visual features such as spatial locations, color, and texture, incurring a significant perceptual difference between intended and reconstructed images. To address this limitation, in this paper, we propose a novel language-oriented SC framework that communicates both text and a compressed image embedding and combines them using a latent diffusion model to reconstruct the intended image. Experimental results validate the potential of our approach, which transmits only 2.09\% of the original image size while achieving higher perceptual similarities in noisy communication channels compared to a baseline SC method that communicates only through text.The code is available at https://github.com/ispamm/Img2Img-SC/ .

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. “Framing image description as a ranking task: Data, models and evaluation metrics,” Aug. 2013.
  2. “Semantic communications: Principles and challenges,” ArXiv preprint: arXiv:2201.01389, 2021.
  3. “Toward semantic communication protocols: A probabilistic logic perspective,” IEEE Journal on Selected Areas in Communications, vol. 41, pp. 2670–2686, 2022.
  4. “Beyond transmitting bits: Context, semantics, and task-oriented communications,” IEEE Journal on Selected Areas in Communications, vol. 41, pp. 5–41, 2022.
  5. “Semantic communications: Overview, open issues, and future research directions,” IEEE Wireless Comm., vol. 29, no. 1, pp. 210–219, 2022.
  6. “Semantic communications based on adaptive generative models and information bottleneck,” IEEE Comm. Magazine, 2023.
  7. “Generative model based highly efficient semantic communication approach for image transmission,” IEEE Int. Conf. on Acoustics, Speech and Signal Process. (ICASSP), 2022.
  8. “Sequential semantic generative communication for progressive text-to-image generation,” in 20th Annual IEEE Int. Conf. on Sensing, Comm., and Netw. (SECON), 2023, pp. 91–94.
  9. “Language-oriented communication with semantic coding and knowledge distillation for text-to-image generation,” in IEEE Int. Conf. on Acoustics, Speech and Signal Process., 2024.
  10. “VQ-VAE Empowered Wireless Communication for Joint Source-Channel Coding and Beyond,” in IEEE Global Comm. Conf. (GLOBECOM), 2023.
  11. “Generative AI meets semantic communication: Evolution and revolution of communication tasks,” ArXiv preprint: arXiv:2401.06803, 2024.
  12. “Generative semantic communication: Diffusion models beyond bit recovery,” ArXiv preprint: arXiv:2306.04321, 2023.
  13. “Generative model based highly efficient semantic communication approach for image transmission,” IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2023.
  14. “Diffusion models for audio semantic communication,” in IEEE Int. Conf. on Audio, Speech, and Signal Process. (ICASSP), 2024.
  15. “Personalized neural speech codec,” in IEEE Int. Conf. on Acoustics, Speech, and Signal Process. (ICASSP), 2024.
  16. “High-resolution image synthesis with latent diffusion models,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, p. 10674–10685.
  17. “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in Int. Conf. on Machine Learning (ICML), 2022.
  18. “Taming transformers for high-resolution image synthesis,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2021.
  19. “Learning transferable visual models from natural language supervision,” in Int. Conf. on Machine Learning (ICML), 2021, pp. 8748–8763.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com