Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Attention-Guided Adaptation for Code-Switching Speech Recognition (2312.08856v2)

Published 14 Dec 2023 in eess.AS and cs.SD

Abstract: The prevalence of the powerful multilingual models, such as Whisper, has significantly advanced the researches on speech recognition. However, these models often struggle with handling the code-switching setting, which is essential in multilingual speech recognition. Recent studies have attempted to address this setting by separating the modules for different languages to ensure distinct latent representations for languages. Some other methods considered the switching mechanism based on language identification. In this study, a new attention-guided adaptation is proposed to conduct parameter-efficient learning for bilingual ASR. This method selects those attention heads in a model which closely express language identities and then guided those heads to be correctly attended with their corresponding languages. The experiments on the Mandarin-English code-switching speech corpus show that the proposed approach achieves a 14.2% mixed error rate, surpassing state-of-the-art method, where only 5.6% additional parameters over Whisper are trained.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. “Introduction to the special section on deep learning for speech and language processing,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 4–6, 2012.
  2. S. Poplack, Syntactic Structure and Social Function of Code-switching, Centro de Estudios Puertorriqueños, City University of New York, 1978.
  3. K. A. H. Zirker, Intrasentential vs. Intersentential Code Switching in Early and Late Bilinguals, Brigham Young University, 2007.
  4. “Unsupervised cross-lingual representation learning for speech recognition,” in Proc. of Annual Conference of International Speech Communication Association, 2021, pp. 2426–2430.
  5. “Robust speech recognition via large-scale weak supervision,” in Proc. of International Conference on Machine Learning, 2023, pp. 28492–28518.
  6. “Google USM: Scaling automatic speech recognition beyond 100 languages,” arXiv preprint arXiv:2303.01037, 2023.
  7. “Scaling speech technology to 1,000+ languages,” arXiv preprint arXiv:2305.13516, 2023.
  8. “Prompting large language models to generate code-mixed texts: The case of south east asian languages,” arXiv preprint arXiv:2303.13592, 2023.
  9. “Learning adapters for code-switching speech recognition,” in Proc. of Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023, pp. 344–349.
  10. “Towards context-aware end-to-end code-switching speech recognition,” in Proc. of Annual Conference of International Speech Communication Association, 2020, pp. 4776–4780.
  11. “Bayesian transformer using disentangled mask attention,” in Proc. of Annual Conference of International Speech Communication Association, 2022, pp. 1761–1765.
  12. “Constrained output embeddings for end-to-end code-switching speech recognition with only monolingual data,” in Proc. of Annual Conference of International Speech Communication Association, 2019, pp. 2160–2164.
  13. “Online Compressive Transformer for End-to-End Speech Recognition,” in Proc. of Annual Conference of International Speech Communication Association, 2021, pp. 2082–2086.
  14. “Reducing multilingual context confusion for end-to-end code-switching automatic speech recognition,” in Proc. of Annual Conference of International Speech Communication Association, 2022, pp. 3894–3898.
  15. “Hierarchical and self-attended sequence autoencoder,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 4975–4986, 2022.
  16. “Learning continuous-time dynamics with attention,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 2, pp. 1906–1918, 2023.
  17. “Transformer-transducers for code-switched speech recognition,” in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, 2021, pp. 5859–5863.
  18. “Adversarial mask transformer for sequential learning,” in Prof. of International Conference on Acoustics, Speech and Signal Processing, 2022, pp. 4178–4182.
  19. “Variational disentangled attention and regularization for visual dialog,” in Proc. of International Joint Conference on Neural Networks, 2023, pp. 01–09.
  20. “Parameter-efficient learning for text-to-speech accent adaptation,” in Proc. of Annual Conference of International Speech Communication Association, 2023, pp. 4354–4358.
  21. “Prompting the hidden talent of web-scale speech models for zero-shot task generalization,” in Proc. of Annual Conference of International Speech Communication Association, 2023, pp. 396–400.
  22. “Fixed encoder self-attention patterns in transformer-based machine translation,” in Findings of the Association for Computational Linguistics, 2020, pp. 556–568.
  23. “Supportive and self attentions for image caption,” in Proc. of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020, pp. 1713–1718.
  24. “Parameter-efficient transfer learning for NLP,” in Proc. of International Conference on Machine Learning, 2019, pp. 2790–2799.
  25. “ESPnet: End-to-end speech processing toolkit,” in Proc. of Annual Conference of International Speech Communication Association, 2018, pp. 2207–2211.
  26. “SEAME: A Mandarin-English code-switching speech corpus in south-east Asia,” in Proc. of Annual Conference of International Speech Communication Association, 2010, pp. 1986–1989.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Bobbi Aditya (1 paper)
  2. Mahdin Rohmatillah (2 papers)
  3. Liang-Hsuan Tai (2 papers)
  4. Jen-Tzung Chien (6 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com