Optimizing Two-Pass Cross-Lingual Transfer Learning: Phoneme Recognition and Phoneme to Grapheme Translation (2312.03312v1)
Abstract: This research optimizes two-pass cross-lingual transfer learning in low-resource languages by enhancing phoneme recognition and phoneme-to-grapheme translation models. Our approach optimizes these two stages to improve speech recognition across languages. We optimize phoneme vocabulary coverage by merging phonemes based on shared articulatory characteristics, thus improving recognition accuracy. Additionally, we introduce a global phoneme noise generator for realistic ASR noise during phoneme-to-grapheme training to reduce error propagation. Experiments on the CommonVoice 12.0 dataset show significant reductions in Word Error Rate (WER) for low-resource languages, highlighting the effectiveness of our approach. This research contributes to the advancements of two-pass ASR systems in low-resource languages, offering the potential for improved cross-lingual transfer learning.
- “Joint unsupervised and supervised training for multilingual asr,” in International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6402––6406.
- “Pseudo-labeling for massively multilingual speech recognition,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 7687–7691.
- “Multilingual speech recognition with a single end-to-end model,” 2018.
- “Simple and effective zero-shot cross-lingual phoneme recognition,” in Interspeech, 2022.
- “Unispeech: Unified speech representation learning with labeled and unlabeled data,” in Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, Marina Meila and Tong Zhang, Eds. 2021, vol. 139 of Proceedings of Machine Learning Research, pp. 10937–10947, PMLR.
- Tanja Schultz, “Globalphone: a multilingual speech and text database developed at karlsruhe university,” in Proc. 7th International Conference on Spoken Language Processing (ICSLP 2002), 2002, pp. 345–348.
- I.P.Association, “Handbook of the international phonetic association: A guide to the use of the international phonetic alphabet,” in Cambridge University Press, 1999.
- “Automatic speech recognition for under-resourced languages: A survey,” Speech Communication, vol. 56, pp. 85–100, 2014.
- “Wav2vec 2.0: A framework for self-supervised learning of speech representations,” in Proceedings of the 34th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 2020, NIPS’20, Curran Associates Inc.
- “Universal phone recognition with a multilingual allophone system,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 8249–8253.
- “Correction of automatic speech recognition with transformer sequence-to-sequence model,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 7074–7078.
- “End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition,” in Proc. Interspeech 2021, 2021, pp. 266–270.
- “Developing rnn-t models surpassing high-performance hybrid models with customization capability,” 2020.
- “A spelling correction model for end-to-end speech recognition,” 05 2019.
- “CUNI neural ASR with phoneme-level intermediate step for~Non-Native~SLT at IWSLT 2020,” in Proceedings of the 17th International Conference on Spoken Language Translation, Online, July 2020, pp. 191–199, Association for Computational Linguistics.
- “Tranusr: Phoneme-to-word transcoder based unified speech representation learning for cross-lingual speech recognition,” 2023.
- “Panphon: A resource for mapping IPA segments to articulatory feature vectors,” in Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 2016, pp. 3475–3484, ACL.
- “CCNet: Extracting high quality monolingual datasets from web crawl data,” in Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France, May 2020, pp. 4003–4012, European Language Resources Association.
- “Unsupervised cross-lingual representation learning at scale,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, July 2020, pp. 8440–8451, Association for Computational Linguistics.
- “Phonemizer: Text to phones transcription for multiple languages in python,” Journal of Open Source Software, vol. 6, no. 68, pp. 3958, 2021.
- “Unsupervised Cross-Lingual Representation Learning for Speech Recognition,” in Proc. Interspeech 2021, 2021, pp. 2426–2430.