Language-Oriented Semantic Latent Representation for Image Transmission (2405.09976v1)
Abstract: In the new paradigm of semantic communication (SC), the focus is on delivering meanings behind bits by extracting semantic information from raw data. Recent advances in data-to-text models facilitate language-oriented SC, particularly for text-transformed image communication via image-to-text (I2T) encoding and text-to-image (T2I) decoding. However, although semantically aligned, the text is too coarse to precisely capture sophisticated visual features such as spatial locations, color, and texture, incurring a significant perceptual difference between intended and reconstructed images. To address this limitation, in this paper, we propose a novel language-oriented SC framework that communicates both text and a compressed image embedding and combines them using a latent diffusion model to reconstruct the intended image. Experimental results validate the potential of our approach, which transmits only 2.09\% of the original image size while achieving higher perceptual similarities in noisy communication channels compared to a baseline SC method that communicates only through text.The code is available at https://github.com/ispamm/Img2Img-SC/ .
- “Framing image description as a ranking task: Data, models and evaluation metrics,” Aug. 2013.
- “Semantic communications: Principles and challenges,” ArXiv preprint: arXiv:2201.01389, 2021.
- “Toward semantic communication protocols: A probabilistic logic perspective,” IEEE Journal on Selected Areas in Communications, vol. 41, pp. 2670–2686, 2022.
- “Beyond transmitting bits: Context, semantics, and task-oriented communications,” IEEE Journal on Selected Areas in Communications, vol. 41, pp. 5–41, 2022.
- “Semantic communications: Overview, open issues, and future research directions,” IEEE Wireless Comm., vol. 29, no. 1, pp. 210–219, 2022.
- “Semantic communications based on adaptive generative models and information bottleneck,” IEEE Comm. Magazine, 2023.
- “Generative model based highly efficient semantic communication approach for image transmission,” IEEE Int. Conf. on Acoustics, Speech and Signal Process. (ICASSP), 2022.
- “Sequential semantic generative communication for progressive text-to-image generation,” in 20th Annual IEEE Int. Conf. on Sensing, Comm., and Netw. (SECON), 2023, pp. 91–94.
- “Language-oriented communication with semantic coding and knowledge distillation for text-to-image generation,” in IEEE Int. Conf. on Acoustics, Speech and Signal Process., 2024.
- “VQ-VAE Empowered Wireless Communication for Joint Source-Channel Coding and Beyond,” in IEEE Global Comm. Conf. (GLOBECOM), 2023.
- “Generative AI meets semantic communication: Evolution and revolution of communication tasks,” ArXiv preprint: arXiv:2401.06803, 2024.
- “Generative semantic communication: Diffusion models beyond bit recovery,” ArXiv preprint: arXiv:2306.04321, 2023.
- “Generative model based highly efficient semantic communication approach for image transmission,” IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2023.
- “Diffusion models for audio semantic communication,” in IEEE Int. Conf. on Audio, Speech, and Signal Process. (ICASSP), 2024.
- “Personalized neural speech codec,” in IEEE Int. Conf. on Acoustics, Speech, and Signal Process. (ICASSP), 2024.
- “High-resolution image synthesis with latent diffusion models,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, p. 10674–10685.
- “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in Int. Conf. on Machine Learning (ICML), 2022.
- “Taming transformers for high-resolution image synthesis,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2021.
- “Learning transferable visual models from natural language supervision,” in Int. Conf. on Machine Learning (ICML), 2021, pp. 8748–8763.