Semantic MIMO Systems for Speech-to-Text Transmission (2405.08096v2)
Abstract: Semantic communications have been utilized to execute numerous intelligent tasks by transmitting task-related semantic information instead of bits. In this article, we propose a semantic-aware speech-to-text transmission system for the single-user multiple-input multiple-output (MIMO) and multi-user MIMO communication scenarios, named SAC-ST. Particularly, a semantic communication system to serve the speech-to-text task at the receiver is first designed, which compresses the semantic information and generates the low-dimensional semantic features by leveraging the transformer module. In addition, a novel semantic-aware network is proposed to facilitate transmission with high semantic fidelity by identifying the critical semantic information and guaranteeing its accurate recovery. Furthermore, we extend the SAC-ST with a neural network-enabled channel estimation network to mitigate the dependence on accurate channel state information and validate the feasibility of SAC-ST in practical communication environments. Simulation results will show that the proposed SAC-ST outperforms the communication framework without the semantic-aware network for speech-to-text transmission over the MIMO channels in terms of the speech-to-text metrics, especially in the low signal-to-noise regime. Moreover, the SAC-ST with the developed channel estimation network is comparable to the SAC-ST with perfect channel state information.
- Z. Weng, Z. Qin, and X. Tao, “Semantic-aware speech-to-text transmission over mimo channels,” in Proc. IEEE Int. Conf. Commun. Workshops (ICC Workshops), Rome, Italy, May 2023, pp. 1362–1367.
- W. Tong and G. Y. Li, “Nine challenges in artificial intelligence and wireless communications for 6G,” IEEE Wireless Commun., vol. 29, no. 4, pp. 140–145, May 2022.
- Y. Shi, Y. Zhou, D. Wen, Y. Wu, C. Jiang, and K. B. Letaief, “Task-oriented communications for 6G: Vision, principles, and technologies,” IEEE Wireless Commun., vol. 30, no. 3, pp. 78–85, Jun. 2023.
- Z. Qin, X. Tao, J. Lu, W. Tong, and G. Y. Li, “Semantic communications: Principles and challenges,” arXiv preprint arXiv:2201.01389, Dec. 2021.
- H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos, “Learning to optimize: Training deep neural networks for interference management,” IEEE Trans. Signal Process., vol. 66, no. 20, pp. 5438–5453, Oct. 2018.
- Z. Qin, F. Gao, B. Lin, X. Tao, G. Liu, and C. Pan, “A generalized semantic communication system: From sources to channels,” IEEE Wireless Commun., vol. 30, no. 3, pp. 18–26, Jun. 2023.
- H. Xie, Z. Qin, G. Y. Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,” IEEE Trans. Signal Process., vol. 69, pp. 2663–2675, Apr. 2021.
- X. Peng, Z. Qin, D. Huang, X. Tao, J. Lu, G. Liu, and C. Pan, “A robust deep learning enabled semantic communication system for text,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), Rio de Janeiro, Brazil, Dec. 2022, pp. 2704–2709.
- P. Jiang, C.-K. Wen, S. Jin, and G. Y. Li, “Deep source-channel coding for sentence semantic transmission with HARQ,” IEEE Trans. Commun., vol. 70, no. 8, pp. 5225–5240, Aug. 2022.
- J. Liang, Y. Xiao, Y. Li, G. Shi, and M. Bennis, “Life-long learning for reasoning-based semantic communication,” in Proc. IEEE Int. Conf. Commun. (ICC), Seoul, South Korea, May 2022, pp. 271–276.
- H. Xie, Z. Qin, X. Tao, and K. B. Letaief, “Task-oriented multi-user semantic communications,” IEEE J. Sel. Areas Commun., vol. 40, no. 9, pp. 2584–2597, Jul. 2022.
- H. Xie, Z. Qin, and G. Y. Li, “Semantic communication with memory,” IEEE J. Sel. Areas Commun., vol. 41, no. 8, pp. 2658–2669, Aug. 2023.
- H. Nam, J. Park, J. Choi, and S.-L. Kim, “Sequential semantic generative communication for progressive text-to-image generation,” in Proc. Annu. IEEE Int. Conf. Sens., Commun., Netw (SECON), Madrid, Spain, Sept. 2023, pp. 91–94.
- Z. Weng and Z. Qin, “Semantic communication systems for speech transmission,” IEEE J. Sel. Areas Commun., vol. 39, no. 8, pp. 2434–2444, Aug. 2021.
- T. Han, Q. Yang, Z. Shi, S. He, and Z. Zhang, “Semantic-preserved communication system for highly efficient speech transmission,” IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp. 245–259, Jan. 2023.
- Z. Xiao, S. Yao, J. Dai, S. Wang, K. Niu, and P. Zhang, “Wireless deep speech semantic transmission,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Rhodes Island, Greece, May 2023, pp. 1–5.
- Z. Weng, Z. Qin, X. Tao, C. Pan, G. Liu, and G. Y. Li, “Deep learning enabled semantic communications with speech recognition and synthesis,” IEEE Trans. Wireless Commun., vol. 22, no. 9, pp. 6227–6240, Feb. 2023.
- G. Zhang, Q. Hu, Z. Qin, Y. Cai, G. Yu, X. Tao, and G. Y. Li, “A unified multi-task semantic communication system for multimodal data,” arXiv preprint arXiv:2209.07689, Sep. 2022.
- D. Huang, F. Gao, X. Tao, Q. Du, and J. Lu, “Toward semantic communications: Deep learning-based image semantic coding,” IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp. 55–71, Jan. 2023.
- X. Kang, B. Song, J. Guo, Z. Qin, and F. R. Yu, “Task-oriented image transmission for scene classification in unmanned aerial systems,” IEEE Trans. Commun., vol. 70, no. 8, pp. 5181–5192, Jun. 2022.
- Q. Hu, G. Zhang, Z. Qin, Y. Cai, G. Yu, and G. Y. Li, “Robust semantic communications with masked VQ-VAE enabled codebook,” IEEE Trans. Wireless Commun., pp. 1–1, Apr. 2023.
- G. Nan, X. Liu, X. Lyu, Q. Cui, X. Xu, and P. Zhang, “UDSem: A unified distributed learning framework for semantic communications over wireless networks,” IEEE Netw., pp. 1–8, Apr. 2023.
- H. Zhang, S. Shao, M. Tao, X. Bi, and K. B. Letaief, “Deep learning-enabled semantic communication systems with task-unaware transmitter and dynamic data,” IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp. 170–185, Nov. 2022.
- P. Tandon, S. Chandak, P. Pataranutaporn, Y. Liu, A. M. Mapuranga, P. Maes, T. Weissman, and M. Sra, “Txt2Vid: Ultra-low bitrate compression of talking-head videos via text,” IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp. 107–118, Jan. 2023.
- P. Jiang, C.-K. Wen, S. Jin, and G. Y. Li, “Wireless semantic communications for video conferencing,” IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp. 230–244, Jan. 2023.
- S. Wang, J. Dai, Z. Liang, K. Niu, Z. Si, C. Dong, X. Qin, and P. Zhang, “Wireless deep video semantic transmission,” IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp. 214–229, Jan. 2023.
- L. Xia, Y. Sun, C. Liang, D. Feng, R. Cheng, Y. Yang, and M. A. Imran, “WiserVR: Semantic communication enabled wireless virtual reality delivery,” IEEE Wireless Commun., vol. 30, no. 2, pp. 32–39, Apr. 2023.
- B. Zhang, Z. Qin, and G. Y. Li, “Semantic communications with variable-length coding for extended reality,” IEEE J. Sel. Top. Signal Process., pp. 1–14, Aug. 2023.
- H. Wu, Y. Shao, C. Bian, K. Mikolajczyk, and D. Gündüz, “Vision transformer for adaptive image transmission over MIMO channels,” arXiv preprint arXiv:2210.15347, Oct. 2022.
- C. Bian, Y. Shao, H. Wu, and D. Gunduz, “Space-time design for deep joint source channel coding of images over MIMO channels,” arXiv preprint arXiv:2210.16985, Oct. 2022.
- S. Yao, S. Wang, J. Dai, K. Niu, and P. Zhang, “Versatile semantic coded transmission over MIMO fading channels,” arXiv preprint arXiv:2210.16741, Oct. 2022.
- X. Luo, R. Gao, H.-H. Chen, S. Chen, Q. Guo, and P. N. Suganthan, “Multi-modal and multi-user semantic communications for channel-level information fusion,” IEEE Wireless Commun., pp. 1–18, Oct. 2022.
- T. Van Chien, L. H. Phong, D. X. Phuc, and N. T. Hoa, “Image restoration under semantic communications,” in Proc. Int. Conf. Adv. Technol. Commun. (ATC), Ha Noi, Vietnam, Oct. 2022, pp. 332–337.
- G. Zhang, Q. Hu, Y. Cai, and G. Yu, “SCAN: Semantic communication with adaptive channel feedback,” arXiv preprint arXiv:2306.15534, Jun. 2023.
- L.-U. Choi and R. Murch, “A transmit preprocessing technique for multiuser MIMO systems using a decomposition approach,” IEEE Trans. Wireless Commun., vol. 3, no. 1, pp. 20–24, Jan. 2004.
- A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks,” in Proc. Int. Conf. Mach. Learning (ICML), Pittsburgh, USA, Jun. 2006, pp. 369–376.
- D. Cer, M. Diab, E. Agirre, I. Lopez-Gazpio, and L. Specia, “Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation,” arXiv preprint arXiv:1708.00055, Jul. 2017.
- A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu et al., “Conformer: Convolution-augmented transformer for speech recognition,” arXiv preprint arXiv:2005.08100, May 2020.
- Zhenzi Weng (7 papers)
- Zhijin Qin (81 papers)
- Huiqiang Xie (11 papers)
- Xiaoming Tao (34 papers)
- Khaled B. Letaief (210 papers)