Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring data augmentation in bias mitigation against non-native-accented speech (2312.15499v1)

Published 24 Dec 2023 in eess.AS

Abstract: Automatic speech recognition (ASR) should serve every speaker, not only the majority standard'' speakers of a language. In order to build inclusive ASR, mitigating the bias against speaker groups who speak in anon-standard'' or ``diverse'' way is crucial. We aim to mitigate the bias against non-native-accented Flemish in a Flemish ASR system. Since this is a low-resource problem, we investigate the optimal type of data augmentation, i.e., speed/pitch perturbation, cross-lingual voice conversion-based methods, and SpecAugment, applied to both native Flemish and non-native-accented Flemish, for bias mitigation. The results showed that specific types of data augmentation applied to both native and non-native-accented speech improve non-native-accented ASR while applying data augmentation to the non-native-accented speech is more conducive to bias reduction. Combining both gave the largest bias reduction for human-machine interaction (HMI) as well as read-type speech.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. O. Scharenborg, “Inclusive speech technology,” https://www.tudelft.nl/tu-delft-safety-security-institute/events/webinars/webinar-inclusive-speech-technology-1.
  2. “Quantifying bias in automatic speech recognition,” arXiv preprint arXiv:2103.15122, 2021.
  3. T. Patel and O. Scharenborg, “Using data augmentations and vtln to reduce bias in dutch end-to-end speech recognition systems,” arXiv preprint arXiv:2307.02009, 2023.
  4. “Model-based approach for measuring the fairness in asr,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 6532–6536.
  5. “Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities,” in Proc. Interspeech, 2022, pp. 1268–1272.
  6. R. Tatman, “Gender and dialect bias in YouTube’s automatic captions,” in Proceedings of the first ACL workshop on ethics in natural language processing, 2017, pp. 53–59.
  7. “Gender representation in French broadcast corpora and its impact on ASR performance,” in Proceedings of the 1st International Workshop on AI for Smart TV Content Production, Access and Delivery, New York, NY, USA, 2019, AI4TV ’19, pp. 3–9, Association for Computing Machinery.
  8. “Investigating the impact of gender representation in ASR training data: a case study on Librispeech,” in 3rd Workshop on Gender Bias in Natural Language Processing, Online, France, Aug. 2021, pp. 86–92, Association for Computational Linguistics.
  9. “Comparing data augmentation and training techniques to reduce bias against non-native accents in hybrid speech recognition systems,” in Proc. 1st Workshop on Speech for Social Good (S4SG), 2022, pp. 15–19.
  10. “Mitigating bias against non-native accents,” in Proc. Interspeech, 2022, pp. 3168–3172.
  11. “Automatic speech recognition of non-native child speech for language learning applications,” in 12th Symposium on Languages, Applications and Technologies (SLATE), 2023.
  12. “Impact of age in ASR for the elderly: preliminary experiments in European Portuguese,” in Advances in Speech and Language Technologies for Iberian Languages, pp. 139–147. Springer, 2012.
  13. “Multilingual speech recognition for the elderly: The AALFred personal life assistant,” Procedia Computer Science, vol. 67, pp. 283–292, 2015.
  14. “Study of formant modification for children ASR,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 7429–7433.
  15. M. Sawalha and M. Abu Shariah, “The effects of speakers’ gender, age, and region on overall performance of arabic automatic speech recognition systems using the phonetically rich and balanced modern standard arabic speech corpus,” in Proceedings of the 2nd Workshop of Arabic Corpus Linguistics WACL-2, 2013.
  16. M. Ngueajio and G. Washington, “Hey asr system! why aren’t you more inclusive? automatic speech recognition systems’ bias and proposed bias mitigation techniques. a literature review,” in International Conference on Human-Computer Interaction, 2022, pp. 421–440.
  17. “Racial disparities in automated speech recognition,” Proceedings of the National Academy of Sciences, vol. 117, no. 14, pp. 7684–7689, 2020.
  18. “Making more of little data: Improving low-resource automatic speech recognition using data augmentation,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada, July 2023, pp. 715–729, Association for Computational Linguistics.
  19. “Data augmentation for children’s speech recognition–the” ethiopian” system for the slt 2021 children speech recognition challenge,” arXiv preprint arXiv:2011.04547, 2020.
  20. C. Bellettini and G. Mazzini, “Reliable automatic recognition for pitch-shifted audio,” in Proceedings of 17th International Conference on Computer Communications and Networks. IEEE, 2008, pp. 1–6.
  21. “Wav2letter: an end-to-end convnet-based speech recognition system,” arXiv preprint, arXiv:1609.03193, 2016.
  22. “Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks,” in Sixteenth Annual Conference of the International Speech Communication Association, 2015.
  23. “Again-VC: A one-shot voice conversion using activation guidance and adaptive instance normalization,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 5954–5958.
  24. “Domain-adversarial training of neural networks,” The journal of machine learning research, vol. 17, no. 1, pp. 2096–2030, 2016.
  25. “Data augmentation improves recognition of foreign accented speech.,” in Interspeech, 2018, number September, pp. 2409–2413.
  26. “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” in Proc. Interspeech, 2019, pp. 2613–2617.
  27. “Audio augmentation for speech recognition,” in Sixteenth annual conference of the international speech communication association, 2015.
  28. “Data augmentation based non-parallel voice conversion with frame-level speaker disentangler,” Speech Communication, vol. 136, pp. 14–22, 2022.
  29. “The effectiveness of time stretching for enhancing dysarthric speech for improved dysarthric speech recognition,” in Proc. Interspeech, 2022, pp. 36–40.
  30. “Bias in Flemish automatic speech recognition,” in Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung, 2023, pp. 158–165.
  31. “CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit (version 0.92),” University of Edinburgh. The Centre for Speech Technology Research (CSTR), 2019.
  32. N. Oostdijk, “The Spoken Dutch Corpus. Overview and first evaluation,” in Proc. 2nd International Conference on Language Resources and Evaluation (LREC’00), Athens, Greece, May 2000, European Language Resources Association (ELRA).
  33. “JASMIN-CGN: Extension of the Spoken Dutch Corpus with speech of elderly people, children and non-natives in the human-machine interaction modality,” in Proc. 5th International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy, May 2006, European Language Resources Association (ELRA).
  34. “Espnet: End-to-end speech processing toolkit,” in Proc. Interspeech, 2018, pp. 2207–2211.
  35. “Generalized end-to-end loss for speaker verification,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 4879–4883.
  36. “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Proc. Interspeech, 2020, pp. 5036–5040.
  37. “Joint ctc-attention based end-to-end speech recognition using multi-task learning,” in IEEE international conference on acoustics, speech and signal processing (ICASSP), 2017, pp. 4835–4839.
  38. “A hybrid asr system for southern dutch,” Computational Linguistics in the Netherlands Journal, vol. 11, pp. 27–34, Dec. 2021.
Citations (4)

Summary

We haven't generated a summary for this paper yet.