Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks (2404.11280v2)

Published 17 Apr 2024 in cs.NI and cs.AI

Abstract: To reduce network traffic and support environments with limited resources, a method for transmitting images with minimal transmission data is required. Several machine learning-based image compression methods, which compress the data size of images while maintaining their features, have been proposed. However, in certain situations, reconstructing only the semantic information of images at the receiver end may be sufficient. To realize this concept, semantic-information-based communication, called semantic communication, has been proposed, along with an image transmission method using semantic communication. This method transmits only the semantic information of an image, and the receiver reconstructs it using an image-generation model. This method utilizes a single type of semantic information for image reconstruction, but reconstructing images similar to the original image using only this information is challenging. This study proposes a multi-modal image transmission method that leverages various types of semantic information for efficient semantic communication. The proposed method extracts multi-modal semantic information from an original image and transmits only that to a receiver. Subsequently, the receiver generates multiple images using an image-generation model and selects an output image based on semantic similarity. The receiver must select the result based only on the received features; however, evaluating the similarity using conventional metrics is challenging. Therefore, this study explores new metrics to evaluate the similarity between semantic features of images and proposes two scoring procedures for evaluating semantic similarity between images based on multiple semantic features. The results indicate that the proposed procedures can compare semantic similarities, such as position and composition, between the semantic features of the original and generated images.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. M.F. Ali, D.N.K. Jayakody, and Y. Li, “Recent trends in underwater visible light communication (UVLC) systems,” IEEE Access, vol.10, pp.22169–22225, Feb. 2022, DOI:10.1109/ACCESS.2022.3150093.
  2. D.N. Sandeep and V. Kumar, “Review on clustering, coverage and connectivity in underwater wireless sensor networks: A communication techniques perspective,” IEEE Access, vol.5, pp.11176–11199, June 2017, DOI:10.1109/ACCESS.2017.2713640.
  3. O. Kodheli, E. Lagunas, N. Maturo, S.K. Sharma, B. Shankar, J.F.M. Montoya, J.C.M. Duncan, D. Spano, S. Chatzinotas, S. Kisseleff, J. Querol, L. Lei, T.X. Vu, and G. Goussetis, “Satellite communications in the new space era: A survey and future challenges,” IEEE Commun. Surveys & Tutorials, vol.23, no.1, pp.70–109, Firstquarter 2021, DOI:10.1109/COMST.2020.3028247.
  4. M.A. Ullah, K. Mikhaylov, and H. Alves, “Enabling mMTC in remote areas: LoRaWAN and LEO satellite integration for offshore wind farm monitoring,” IEEE Trans. on Industrial Informatics, vol.18, no.6, pp.3744–3753, June 2022, DOI:10.1109/TII.2021.3112386.
  5. Y. Zheng, Y. Zhao, W. Liu, S. Liu, and R. Yao, “An intelligent wireless system for field ecology monitoring and forest fire warning,” Sensors, vol.18, no.12, pp.3744–3753, Dec. 2018, DOI:10.3390/s18124457.
  6. “Cisco Annual Internet Report (2018–2023) White Paper.” https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.pdf. (Accessed on 01/22/2024).
  7. “Information and Communications in Japan 2022.” https://www.soumu.go.jp/johotsusintokei/whitepaper/eng/WP2022/2022-index.html. (Accessed on 01/22/2024).
  8. “Ericsson Mobility Report June 2023.” https://www.ericsson.com/49dd9d/assets/local/reports-papers/mobility-report/documents/2023/ericsson-mobility-report-june-2023.pdf. (Accessed on 01/22/2024).
  9. A. Hussain, A. Al-Fayadh, and N. Radi, “Image compression techniques: A survey in lossless and lossy algorithms,” Neurocomputing, vol.300, pp.44–69, July 2018, DOI:10.1016/j.neucom.2018.02.094.
  10. G.K. Wallace, “The JPEG still picture compression standard,” Communications of the ACM, vol.34, no.4, pp.30–44, April 1991, DOI:10.1145/103085.103089.
  11. A. Skodras, C. Christopoulos, and T. Ebrahimi, “The JPEG 2000 still image compression standard,” IEEE Signal Processing Magazine, vol.18, no.5, pp.36–58, Sept. 2001, DOI:10.1109/79.952804.
  12. S. Jamil, M.J. Piran, M. Rahman, and O.J. Kwon, “Learning-driven lossy image compression: A comprehensive survey,” Engineering App. of Artificial Intel., vol.123, pp.1–17, Aug. 2023, DOI:10.1016/j.engappai.2023.106361.
  13. S. Ma, X. Zhang, C. Jia, Z. Zhao, S. Wang, and S. Wang, “Image and video compression with neural networks: A review,” IEEE Trans. on Circuits and Systems for Video Tech., vol.30, no.6, pp.1683–1698, June 2020, DOI:10.1109/TCSVT.2019.2910119.
  14. E. Bourtsoulatze, D.B. Kurka, and D. Gündüz, “Deep joint source-channel coding for wireless image transmission,” 2019 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2019), Brighton, UK, pp.4774–4778, May 2019.
  15. E. Bourtsoulatze, D. Burth Kurka, and D. Gündüz, “Deep joint source-channel coding for wireless image transmission,” IEEE Trans. on Cognitive Commun. and Netw., vol.5, no.3, pp.567–579, Sept. 2019, DOI:10.1109/TCCN.2019.2919300.
  16. X. Luo, H.H. Chen, and Q. Guo, “Semantic communications: Overview, open issues, and future research directions,” IEEE Wireless Commun., vol.29, no.1, pp.210–219, Feb. 2022, DOI:10.1109/MWC.101.2100269.
  17. D. Gündüz, Z. Qin, I.E. Aguerri, H.S. Dhillon, Z. Yang, A. Yener, K.K. Wong, and C.B. Chae, “Beyond transmitting bits: Context, semantics, and task-oriented communications,” IEEE Journal on Selected Areas in Commun., vol.41, no.1, pp.5–41, Jan. 2023, DOI:10.1109/JSAC.2022.3223408.
  18. Z. Qin, X. Tao, J. Lu, W. Tong, and G.Y. Li, “Semantic communications: Principles and challenges,” arXiv preprint arXiv:2201.01389, pp.1–32, June 2021.
  19. R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR 2022), New Orleans, United States, pp.10684–10695, June 2022.
  20. A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” 38th Int. Conf. on Machine Learning (ICML 2021), pp.8821–8831, July 2021.
  21. A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with clip latents,” arXiv preprint arXiv:2204.06125, pp.1–27, April 2022.
  22. A. Kodaira, C. Xu, T. Hazama, T. Yoshimoto, K. Ohno, S. Mitsuhori, S. Sugano, H. Cho, Z. Liu, and K. Keutzer, “StreamDiffusion: A pipeline-level solution for real-time interactive generation,” arXiv preprint arXiv:2312.12491, pp.1–13, Dec. 2023.
  23. H. Nam, J. Park, J. Choi, M. Bennis, and S.L. Kim, “Language-oriented communication with semantic coding and knowledge distillation for text-to-image generation,” arXiv preprint arXiv:2309.11127, pp.1–5, Sept. 2023.
  24. E. Hosonuma, T. Yamazaki, T. Miyoshi, A. Taya, Y. Nishiyama, and K. Sezaki, “Exploiting spatial and descriptive information for generative compression,” IEEE Consumer Commun. and Netw. Conf. (CCNC 2024), Jan. 2024.
  25. R. Zhang, P. Isola, A.A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, United States, pp.586–595, June 2018.
  26. D. Tellez, G. Litjens, J. van der Laak, and F. Ciompi, “Neural image compression for gigapixel histopathology image analysis,” IEEE Trans. on Pattern Analysis and Machine Intel., vol.43, no.2, pp.567–578, Feb. 2021, DOI:10.1109/TPAMI.2019.2936841.
  27. L. Cavigelli, P. Hager, and L. Benini, “CAS-CNN: A deep convolutional neural network for image compression artifact suppression,” 2017 Int. Joint Conf. on Neural Networks (IJCNN 2017), Anchorage, United Staes, pp.752–759, May 2017.
  28. M. Li, W. Zuo, S. Gu, D. Zhao, and D. Zhang, “Learning convolutional networks for content-weighted image compression,” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, United States, pp.3214–3223, June 2018.
  29. Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Deep convolutional autoencoder-based lossy image compression,” 2018 Picture Coding Symposium (PCS 2018), San Francisco, United Stetes, pp.253–257, June 2018.
  30. L. Theis, W. Shi, A. Cunningham, and F. Huszár, “Lossy image compression with compressive autoencoders,” arXiv preprint arXiv:1703.00395, pp.1–19, March 2017.
  31. J. Li, D. Li, C. Xiong, and S. Hoi, “BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” 39th Int. Conf. on Machine Learning (ICML 2022), Baltimore, United States, pp.12888–12900, July 2022.
  32. A. Radford, J.W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” 38th Int. Conf. on Machine Learning (ICML 2021), pp.8748–8763, July 2021.
  33. L.C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587, pp.1–14, Dec. 2017.
  34. V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. on Pattern Analysis and Machine Intel., vol.39, no.12, pp.2481–2495, Dec. 2017, DOI:10.1109/TPAMI.2016.2644615.
  35. J. Yu, Z. Wang, V. Vasudevan, L. Yeung, M. Seyedhosseini, and Y. Wu, “CoCa: Contrastive captioners are image-text foundation models,” arXiv preprint, arXiv:2205.01917, pp.1–19, June 2022.
  36. T. Zhang, V. Kishore, F. Wu, K.Q. Weinberger, and Y. Artzi, “BERTScore: Evaluating text generation with bert,” 2020 Int. Conf. on Learning Representations (ICLR 2020), pp.1–43, April 2020.
  37. J. Devlin, M.W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, pp.1–16, May 2019.
  38. “PhotoAC.” https://www.photo-ac.com/. (Accessed on 01/22/2024).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Eri Hosonuma (1 paper)
  2. Taku Yamazaki (3 papers)
  3. Takumi Miyoshi (3 papers)
  4. Akihito Taya (7 papers)
  5. Yuuki Nishiyama (4 papers)
  6. Kaoru Sezaki (11 papers)