Low-resource speech recognition and dialect identification of Irish in a multi-task framework
Abstract: This paper explores the use of Hybrid CTC/Attention encoder-decoder models trained with Intermediate CTC (InterCTC) for Irish (Gaelic) low-resource speech recognition (ASR) and dialect identification (DID). Results are compared to the current best performing models trained for ASR (TDNN-HMM) and DID (ECAPA-TDNN). An optimal InterCTC setting is initially established using a Conformer encoder. This setting is then used to train a model with an E-branchformer encoder and the performance of both architectures are compared. A multi-task fine-tuning approach is adopted for LLM (LM) shallow fusion. The experiments yielded an improvement in DID accuracy of 10.8% relative to a baseline ECAPA-TDNN, and WER performance approaching the TDNN-HMM model. This multi-task approach emerges as a promising strategy for Irish low-resource ASR and DID.
- “Automatic speech recognition for irish: the abair-éist system,” in Proceedings of the 4th Celtic Language Technology Workshop within LREC2022, 2022, pp. 47–51.
- “Towards Dialect-inclusive Recognition in a Low-resource Language: Are Balanced Corpora the Answer?,” in Proc. INTERSPEECH 2023, 2023, pp. 5082–5086.
- “Towards spoken dialect identification of irish,” Proceedings of the 2nd annual meeting of the Special Interest Group of Under-resourced Languages, a Workshop at Interspeech 2023, Dublin, Ireland, 2023.
- “XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale,” in Proc. Interspeech 2022, 2022, pp. 2278–2282.
- “Multilingual Speech Recognition with a Single End-to-End Model,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 4904–4908.
- “Improving massively multilingual asr with auxiliary ctc objectives,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
- “Language recognition in ivectors space,” in Proc. Interspeech 2011, 2011, pp. 861–864.
- “Front-end factor analysis for speaker verification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788–798, 2010.
- “X-vectors: Robust dnn embeddings for speaker recognition,” in 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2018, pp. 5329–5333.
- “Phonetic temporal neural model for language identification,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 1, pp. 134–144, 2017.
- “AISpeech-SJTU Accent Identification System for the Accented English Speech Recognition Challenge,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6254–6258.
- “Modeling and training strategies for language recognition systems,” in Proc. Interspeech 2021, 2021.
- “OLR 2021 challenge: Datasets, rules and baselines,” in 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2021, pp. 1097–1103.
- “Ant Multilingual Recognition System for OLR 2021 Challenge,” in Proc. Interspeech 2022, 2022, pp. 3684–3688.
- “Joint ASR and language identification using RNN-T: An efficient approach to dynamic language switching,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 7218–7222.
- “End-to-end japanese multi-dialect speech recognition and dialect identification with multi-task learning,” APSIPA Transactions on Signal and Information Processing, vol. 11, no. 1, 2022.
- “Multi-Dialect Speech Recognition with a Single Sequence-to-Sequence Model,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 4749–4753.
- “Intermediate loss regularization for ctc-based speech recognition,” in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp. 6224–6228.
- “Hierarchical multitask learning with ctc,” in 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2018, pp. 485–490.
- “A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning,” in Proc. INTERSPEECH 2023, 2023, pp. 1528–1532.
- “Ctc-segmentation of large corpora for german end-to-end speech recognition,” in International Conference on Speech and Computer. Springer, 2020, pp. 267–278.
- “ESPnet: End-to-End Speech Processing Toolkit,” in Proc. Interspeech 2018, 2018, pp. 2207–2211.
- “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Proc. Interspeech 2020, 2020, pp. 5036–5040.
- “Jtubespeech: corpus of japanese speech collected from youtube for speech recognition and speaker verification,” arXiv e-prints, pp. arXiv–2112, 2021.
- “JHU Kaldi system for Arabic MGB-3 ASR challenge using diarization, audio-transcript alignment and transfer learning,” in 2017 IEEE automatic speech recognition and understanding workshop (ASRU). IEEE, 2017, pp. 346–352.
- “ParaCrawl: Web-Scale Acquisition of Parallel Corpora,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault, Eds., Online, July 2020, pp. 4555–4567, Association for Computational Linguistics.
- “CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies,” in Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Jan Hajič and Dan Zeman, Eds., Vancouver, Canada, Aug. 2017, pp. 1–19, Association for Computational Linguistics.
- “Hybrid ctc/attention architecture for end-to-end speech recognition,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1240–1253, 2017.
- “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 369–376.
- “E-branchformer: Branchformer with enhanced merging for speech recognition,” in 2022 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2023, pp. 84–91.
- “Cross-dialect lexicon optimisation for an endangered language ASR system: the case of Irish,” in Proc. Interspeech 2022, 2022, pp. 4865–4869.
- “SpeechBrain: A general-purpose speech toolkit,” 2021, arXiv:2106.04624.
- “VoxLingua107: a dataset for spoken language recognition,” in Proc. IEEE SLT Workshop, 2021.
- “Arcface: Additive angular margin loss for deep face recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4690–4699.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.