Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generative error correction for code-switching speech recognition using large language models (2310.13013v1)

Published 17 Oct 2023 in cs.CL, cs.AI, cs.SD, and eess.AS

Abstract: Code-switching (CS) speech refers to the phenomenon of mixing two or more languages within the same sentence. Despite the recent advances in automatic speech recognition (ASR), CS-ASR is still a challenging task ought to the grammatical structure complexity of the phenomenon and the data scarcity of specific training corpus. In this work, we propose to leverage LLMs and lists of hypotheses generated by an ASR to address the CS problem. Specifically, we first employ multiple well-trained ASR models for N-best hypotheses generation, with the aim of increasing the diverse and informative elements in the set of hypotheses. Next, we utilize the LLMs to learn the hypotheses-to-transcription (H2T) mapping by adding a trainable low-rank adapter. Such a generative error correction (GER) method directly predicts the accurate transcription according to its expert linguistic knowledge and N-best hypotheses, resulting in a paradigm shift from the traditional LLM rescoring or error correction techniques. Experimental evidence demonstrates that GER significantly enhances CS-ASR accuracy, in terms of reduced mixed error rate (MER). Furthermore, LLMs show remarkable data efficiency for H2T learning, providing a potential solution to the data scarcity problem of CS-ASR in low-resource languages.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. E. Yılmaz, H. van den Heuvel, and D. Van Leeuwen, “Code-switching detection using multilingual dnns,” in 2016 IEEE SLT.   IEEE, 2016, pp. 610–616.
  2. C. Nilep, “Code switching in sociocultural linguistics,” Colorado research in linguistics, 2006.
  3. A. Diwan, R. Vaideeswaran, S. Shah, A. Singh, S. Raghavan, S. Khare, V. Unni, S. Vyas, A. Rajpuria, C. Yarra et al., “Multilingual and code-switching asr challenges for low resource indian languages,” arXiv preprint arXiv:2104.00235, 2021.
  4. K. Li, J. Li, G. Ye, R. Zhao, and Y. Gong, “Towards code-switching asr for end-to-end ctc models,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2019, pp. 6076–6080.
  5. X. Yue, G. Lee, E. Yılmaz, F. Deng, and H. Li, “End-to-end code-switching asr for low-resourced language pairs,” in 2019 IEEE ASRU.   IEEE, 2019, pp. 972–979.
  6. H. Liu, H. Xu, L. P. Garcia, A. W. H. Khong, Y. He, and S. Khudanpur, “Reducing language confusion for code-switching speech recognition with token-level language diarization,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2023, pp. 1–5.
  7. T. I. Modipa, F. De Wet, and M. H. Davel, “Implications of sepedi/english code switching for asr systems,” 2013.
  8. N. Luo, D. Jiang, S. Zhao, C. Gong, W. Zou, and X. Li, “Towards end-to-end code-switching speech recognition,” arXiv preprint arXiv:1810.13091, 2018.
  9. Y. Li and P. Fung, “Code-switch language model with inversion constraints for mixed language speech recognition,” in Proceedings of COLING 2012, 2012, pp. 1671–1680.
  10. H. Adel, N. T. Vu, F. Kraus, T. Schlippe, H. Li, and T. Schultz, “Recurrent neural network language modeling for code switching conversational speech,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.   IEEE, 2013, pp. 8411–8415.
  11. X. Zhou, X. Tian, G. Lee, R. K. Das, and H. Li, “End-to-end code-switching tts with cross-lingual language model,” in ICASSP.   IEEE, 2020, pp. 7614–7618.
  12. Y. Peng, Y. Liu, J. Zhang, H. Xu, Y. He, H. Huang, and E. S. Chng, “Internal language model estimation based language model fusion for cross-domain code-switching speech recognition,” arXiv preprint arXiv:2207.04176, 2022.
  13. H.-W. Wang, B.-C. Yan, Y.-C. Wang, and B. Chen, “Effective asr error correction leveraging phonetic, semantic information and n-best hypotheses,” in 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).   IEEE, 2022, pp. 117–122.
  14. C. Chen, Y. Hu, C.-H. H. Yang, S. M. Siniscalchi, P.-Y. Chen, and E. S. Chng, “Hyporadise: An open baseline for generative speech recognition with large language models,” arXiv preprint arXiv:2309.15701, 2023.
  15. E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021.
  16. X. Shi, Q. Feng, and L. Xie, “The asru 2019 mandarin-english code-switching speech recognition challenge: Open datasets, tracks, methods and results,” arXiv preprint arXiv:2007.05916, 2020.
  17. A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu et al., “Conformer: Convolution-augmented transformer for speech recognition,” arXiv preprint arXiv:2005.08100, 2020.
  18. A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,” in International Conference on Machine Learning.   PMLR, 2023, pp. 28 492–28 518.
  19. P. Peng, B. Yan, S. Watanabe, and D. Harwath, “Prompting the hidden talent of web-scale speech models for zero-shot task generalization,” arXiv preprint arXiv:2305.11095, 2023.
  20. H. Bu, J. Du, X. Na, B. Wu, and H. Zheng, “Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline,” in 2017 20th conference of the oriental chapter of the international coordinating committee on speech databases and speech I/O systems and assessment.   IEEE, 2017, pp. 1–5.
  21. L. Dong, S. Xu, and B. Xu, “Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition,” in 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP).   IEEE, 2018, pp. 5884–5888.
  22. B. Yan, M. Wiesner, O. Klejch, P. Jyothi, and S. Watanabe, “Towards zero-shot code-switched speech recognition,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing.   IEEE, 2023, pp. 1–5.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Chen Chen (753 papers)
  2. Yuchen Hu (60 papers)
  3. Chao-Han Huck Yang (89 papers)
  4. Hexin Liu (35 papers)
  5. Sabato Marco Siniscalchi (46 papers)
  6. Eng Siong Chng (112 papers)
Citations (7)