How to Evaluate Semantic Communications for Images with ViTScore Metric? (2309.04891v2)
Abstract: Semantic communications (SC) have been expected to be a new paradigm shifting to catalyze the next generation communication, whose main concerns shift from accurate bit transmission to effective semantic information exchange in communications. However, the previous and widely-used metrics for images are not applicable to evaluate the image semantic similarity in SC. Classical metrics to measure the similarity between two images usually rely on the pixel level or the structural level, such as the PSNR and the MS-SSIM. Straightforwardly using some tailored metrics based on deep-learning methods in CV community, such as the LPIPS, is infeasible for SC. To tackle this, inspired by BERTScore in NLP community, we propose a novel metric for evaluating image semantic similarity, named Vision Transformer Score (ViTScore). We prove theoretically that ViTScore has 3 important properties, including symmetry, boundedness, and normalization, which make ViTScore convenient and intuitive for image measurement. To evaluate the performance of ViTScore, we compare ViTScore with 3 typical metrics (PSNR, MS-SSIM, and LPIPS) through 4 classes of experiments: (i) correlation with BERTScore through evaluation of image caption downstream CV task, (ii) evaluation in classical image communications, (iii) evaluation in image semantic communication systems, and (iv) evaluation in image semantic communication systems with semantic attack. Experimental results demonstrate that ViTScore is robust and efficient in evaluating the semantic similarity of images. Particularly, ViTScore outperforms the other 3 typical metrics in evaluating the image semantic changes by semantic attack, such as image inverse with Generative Adversarial Networks (GANs). This indicates that ViTScore is an effective performance metric when deployed in SC scenarios.
- C. E. Shannon and W. Weaver, “The mathematical theory of information,” Urbana: University of Illinois Press, vol. 97, no. 6, pp. 128–164, 1949.
- K. Lu, Q. Zhou, R. Li et al., “Rethinking modern communication from semantic coding to semantic communication,” IEEE Wireless Communications, pp. 1–13, 2022.
- J. Hoydis, F. A. Aoudia, A. Valcarce, and H. Viswanathan, “Toward a 6G AI-native air interface,” IEEE Communications Magazine, vol. 59, no. 5, pp. 76–81, 2021.
- W. Tong and G. Y. Li, “Nine challenges in artificial intelligence and wireless communications for 6G,” IEEE Wireless Communications, 2022.
- D. Gündüz, Z. Qin, I. E. Aguerri et al., “Beyond transmitting bits: Context, semantics, and task-oriented communications,” IEEE Journal on Selected Areas in Communications, vol. 41, no. 1, pp. 5–41, 2022.
- M. Sana and E. C. Strinati, “Learning semantics: An opportunity for effective 6G communications,” in proc. IEEE 19th Annual Consumer Communications & Networking Conference (CCNC), 2022, pp. 631–636.
- M. K. Farshbafan, W. Saad, and M. Debbah, “Common language for goal-oriented semantic communications: A curriculum learning framework,” in proc. IEEE International Conference on Communications (ICC), 2022, pp. 1710–1715.
- E. Beck, C. Bockelmann, and A. Dekorsy, “Semantic communication: An information bottleneck view,” 2022, arXiv:2204.13366. [Online]. Available: https://arxiv.org/abs/2204.13366.
- X. Kang, B. Song, J. Guo et al., “Task-oriented image transmission for scene classification in unmanned aerial systems,” IEEE Transactions on Communications, vol. 70, no. 8, pp. 5181–5192, 2022.
- M. Jankowski, D. Gündüz, and K. Mikolajczyk, “Wireless image retrieval at the edge,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 1, pp. 89–100, 2021.
- Q. Hu, G. Zhang, Z. Qin et al., “Robust semantic communications against semantic noise,” in proc. IEEE 96th Vehicular Technology Conference (VTC), 2022, pp. 1–6.
- K. Liu, D. Liu, L. Li et al., “Semantics-to-signal scalable image compression with learned revertible representations,” International Journal of Computer Vision, vol. 129, no. 9, pp. 2605–2621, 2021.
- J. Dai, S. Wang, K. Tan et al., “Nonlinear transform source-channel coding for semantic communications,” IEEE Journal on Selected Areas in Communications, pp. 2300–2315, 2022.
- W. Zhang, H. Zhang, H. Ma, H. Shao, N. Wang, and V. C. Leung, “Predictive and adaptive deep coding for wireless image transmission in semantic communication,” IEEE Transactions on Wireless Communications, 2023.
- J. Xu, B. Ai, W. Chen et al., “Wireless image transmission using deep source channel coding with attention modules,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 4, pp. 2315–2328, 2022.
- E. Bourtsoulatze, D. B. Kurka, and D. Gündüz, “Deep joint source-channel coding for wireless image transmission,” in proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 4774–4778.
- H. Xie, Z. Qin, G. Y. Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,” IEEE Transactions on Signal Processing, vol. 69, pp. 2663–2675, 2021.
- D. Huang, X. Tao, F. Gao, and J. Lu, “Deep learning-based image semantic coding for semantic communications,” in proc. IEEE Global Communications Conference (GLOBECOM), 2021, pp. 1–6.
- H. Xie, Z. Qin, and G. Y. Li, “Task-oriented multi-user semantic communications for VQA,” IEEE Wireless Communications Letters, vol. 11, no. 3, pp. 553–557, 2021.
- H. Xie, Z. Qin, X. Tao, and K. B. Letaief, “Task-oriented multi-user semantic communications,” IEEE Journal on Selected Areas in Communications, pp. 1–1, 2022.
- B. Güler, A. Yener, and A. Swami, “The semantic communication game,” IEEE Transactions on Cognitive Communications and Networking, vol. 4, no. 4, pp. 787–802, 2018.
- Y. Zhang, H. Zhao, J. Wei et al., “Context-based semantic communication via dynamic programming,” IEEE Transactions on Cognitive Communications and Networking, vol. 8, no. 3, pp. 1453–1467, 2022.
- S. Yao, K. Niu, S. Wang, and J. Dai, “Semantic coding for text transmission: An iterative design,” IEEE Transactions on Cognitive Communications and Networking, vol. 8, no. 4, pp. 1594–1603, 2022.
- H. Seo, J. Park, M. Bennis, and M. Debbah, “Semantics-native communication via contextual reasoning,” IEEE Transactions on Cognitive Communications and Networking, 2023.
- B. Li, L. Ye, J. Liang, Y. Wang, and J. Han, “Region-of-interest and channel attention-based joint optimization of image compression and computer vision,” Neurocomputing, vol. 500, pp. 13–25, 2022.
- Y. Blau and T. Michaeli, “The perception-distortion tradeoff,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6228–6237.
- R. Zhang, P. Isola, A. A. Efros et al., “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595.
- Z. Qin, X. Tao, J. Lu, and G. Y. Li, “Semantic communications: Principles and challenges,” 2022, arXiv: 2201.01389. [Online]. Available: https://arxiv.org/abs/2201.01389.
- Q. Hu, G. Zhang, Z. Qin, Y. Cai, G. Yu, and G. Y. Li, “Robust semantic communications with masked vq-vae enabled codebook,” IEEE Transactions on Wireless Communications, 2023.
- T. Han, J. Tang, Q. Yang, Y. Duan, Z. Zhang, and Z. Shi, “Generative model based highly efficient semantic communication approach for image transmission,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
- H. Du, J. Wang, D. Niyato, J. Kang, Z. Xiong, M. Guizani, and D. I. Kim, “Rethinking wireless communication security in semantic internet of things,” IEEE Wireless Communications, vol. 30, no. 3, pp. 36–43, 2023.
- J. Wang, S. Wang, J. Dai, Z. Si, D. Zhou, and K. Niu, “Perceptual learned source-channel coding for high-fidelity image semantic transmission,” in GLOBECOM 2022-2022 IEEE Global Communications Conference. IEEE, 2022, pp. 3959–3964.
- T. Zhang, V. Kishore, F. Wu et al., “BERTScore: Evaluating text generation with BERT,” 2019, OpenReview. [Online]. Available: https://openreview.net/forum?id=SkeHuCVFDr.
- R. Bommasani, D. A. Hudson, E. Adeli et al., “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258, 2021.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov et al., “An image is worth 16×16161616{\times}1616 × 16 words: Transformers for image recognition at scale,” in Proc. International Conference on Learning Representations (ICLR 2021), 2021.
- Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
- B. Girod, “What’s wrong with mean-squared error?” Digital images and human vision, pp. 207–220, 1993.
- A. M. Eskicioglu and P. S. Fisher, “Image quality measures and their performance,” IEEE Transactions on communications, vol. 43, no. 12, pp. 2959–2965, 1995.
- Z. Wang and A. C. Bovik, “A universal image quality index,” IEEE signal processing letters, vol. 9, no. 3, pp. 81–84, 2002.
- Z. Wang, A. C. Bovik, and L. Lu, “Why is image quality assessment so difficult?” in 2002 IEEE International conference on acoustics, speech, and signal processing, vol. 4. IEEE, 2002, pp. IV–3313.
- N. Liu and G. Zhai, “Free energy adjusted peak signal to noise ratio (FEA-PSNR) for image quality assessment,” Sensing and Imaging, vol. 18, pp. 1–10, 2017.
- Z. Wang and A. C. Bovik, “Mean squared error: Love it or leave it? a new look at signal fidelity measures,” IEEE signal processing magazine, vol. 26, no. 1, pp. 98–117, 2009.
- H. Zhang, Z. Kyaw, S.-F. Chang, and T.-S. Chua, “Visual translation embedding network for visual relation detection,” in Proc. IEEE conference on computer vision and pattern recognition (CVPR), 2017, pp. 5532–5540.
- M. Heusel, H. Ramsauer, T. Unterthiner et al., “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems (NIPS 2017), vol. 30, 2017.
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: A method for automatic evaluation of machine translation,” in Proc. 40th annual meeting of the Association for Computational Linguistics, 2002, pp. 311–318.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” 2018, arXiv:1810.04805. [Online]. Available: https://arxiv.org/abs/1810.04805.
- D. Huang, F. Gao, X. Tao et al., “Toward semantic communications: Deep learning-based image semantic coding,” IEEE Journal on Selected Areas in Communications, vol. 41, no. 1, pp. 55–71, 2022.
- J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European conference on computer vision. Springer, 2016, pp. 694–711.
- J. Wang, Y. Song, T. Leung et al., “Learning fine-grained image similarity with deep ranking,” in Proc. IEEE conference on computer vision and pattern recognition, 2014, pp. 1386–1393.
- A. Frome, G. S. Corrado, J. Shlens et al., “DeViSE: A deep visual-semantic embedding model,” Advances in neural information processing systems, vol. 26, pp. 312–321, 2013.
- J. Zhang, Y. Kalantidis, M. Rohrbach et al., “Large-scale visual relationship understanding,” in Proc. AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 9185–9194.
- A. Paszke, S. Gross, F. Massa et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
- R. Wightman, “Pytorch image models,” 2019. [Online]. Available: https://github.com/rwightman/pytorch-image-models.
- G. Cheng, P. Lai, D. Gao, and J. Han, “Class attention network for image recognition,” Sci China Inf Sci, vol. 66, no. 3, p. 132105, 2023.
- J. Wang, B. Liu, and K. Xu, “Semantic segmentation of high-resolution images,” Sci China Inf Sci, vol. 60, pp. 1–6, 2017.
- T. Lin, M. Maire, S. J. Belongie et al., “Microsoft COCO: common objects in context,” in Proc. 13th European Conference, ser. Lecture Notes in Computer Science, vol. 8693. Springer, 2014, pp. 740–755.
- E. Bourtsoulatze, D. B. Kurka, and D. Gündüz, “Deep joint source-channel coding for wireless image transmission,” IEEE Transactions on Cognitive Communications and Networking, vol. 5, no. 3, pp. 567–579, 2019.
- D. B. Kurka and D. Gündüz, “Deep joint source-channel coding of images with feedback,” in proc. 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 5235–5239.
- ——, “Bandwidth-agile image transmission with deep joint source-channel coding,” IEEE Transactions on Wireless Communications, vol. 20, no. 12, pp. 8081–8095, 2021.
- G. Shi, Y. Xiao, Y. Li, and X. Xie, “From semantic communication to semantic-aware networking: Model, architecture, and open problems,” IEEE Communications Magazine, vol. 59, no. 8, pp. 44–50, 2021.
- C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
- J. Zhu, Y. Shen, D. Zhao, and B. Zhou, “In-domain gan inversion for real image editing,” in Proceedings of European Conference on Computer Vision (ECCV), 2020.
- W. Xia, Y. Zhang, Y. Yang, J.-H. Xue, B. Zhou, and M.-H. Yang, “Gan inversion: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 3121–3138, 2022.
- Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in ICCV, 2015.
- F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser, and J. Xiao, “Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop,” arXiv preprint arXiv:1506.03365, 2015.
- Tingting Zhu (46 papers)
- Bo Peng (304 papers)
- Jifan Liang (5 papers)
- Tingchen Han (1 paper)
- Hai Wan (24 papers)
- Jingqiao Fu (1 paper)
- Junjie Chen (89 papers)