Advancing African-Accented Speech Recognition: Epistemic Uncertainty-Driven Data Selection for Generalizable ASR Models (2306.02105v6)
Abstract: Accents play a pivotal role in shaping human communication, enhancing our ability to convey and comprehend messages with clarity and cultural nuance. While there has been significant progress in Automatic Speech Recognition (ASR), African-accented English ASR has been understudied due to a lack of training datasets, which are often expensive to create and demand colossal human labor. Combining several active learning paradigms and the core-set approach, we propose a new multi-rounds adaptation process that uses epistemic uncertainty to automate the annotation process, significantly reducing the associated costs and human labor. This novel method streamlines data annotation and strategically selects data samples contributing most to model uncertainty, enhancing training efficiency. We define a new U-WER metric to track model adaptation to hard accents. We evaluate our approach across several domains, datasets, and high-performing speech models. Our results show that our approach leads to a 27\% WER relative average improvement while requiring on average 45\% less data than established baselines. Our approach also improves out-of-distribution generalization for very low-resource accents, demonstrating its viability for building generalizable ASR models in the context of accented African ASR. We open-source the code here: https://github.com/bonaventuredossou/active_learning_african_asr.
- Chronic staff shortfalls stifle Africa’s health systems: WHO study — afro.who.int. https://www.afro.who.int/news/chronic-staff-shortfalls-stifle-africas-health-systems-who-study. [Accessed 15-Oct-2022].
- Sautidb: Nigerian accent dataset collection.
- Learning nigerian accent embeddings from speech: preliminary results based on sautidb-naija corpus. arXiv preprint arXiv:2112.06199.
- Introduction of digital speech recognition in a specialised outpatient department: a case study. BMC medical informatics and decision making, 16(1):1–8.
- The health workforce status in the who african region: findings of a cross-sectional study. BMJ Global Health, 7(Suppl 1):e008317.
- Multi-reference evaluation for dialectal speech recognition system: A study for egyptian asr. In Proceedings of the Second Workshop on Arabic Natural Language Processing, pages 118–126.
- Common voice: A massively-multilingual speech corpus. arXiv preprint arXiv:1912.06670.
- Jaco Badenhorst and Febe De Wet. 2017. The limitations of data perturbation for asr of learner data in under-resourced languages. In 2017 Pattern Recognition Association of South Africa and Robotics and Mechatronics (PRASA-RobMech), pages 44–49. IEEE.
- Jaco Badenhorst and Febe De Wet. 2019. The usefulness of imperfect speech data for asr development in low-resource languages. Information, 10(9):268.
- Asr corpus design for resource-scarce languages. ISCA.
- Automatic speech recognition for under-resourced languages: A survey. Speech communication, 56:85–100.
- Multilingual neural network acoustic modelling for asr of under-resourced english-isizulu code-switched speech. In INTERSPEECH, pages 2603–2607.
- Speech recognition for clinical documentation from 1990 to 2018: a systematic review. Journal of the american medical informatics association, 26(4):324–338.
- Physician use of speech recognition versus typing in clinical documentation: a controlled observational study. International Journal of Medical Informatics, 141:104178.
- Frederick Bukachi and Neil Pakenham-Walsh. 2007. Information technology for health in developing countries. Chest, 132(5):1624–1630.
- Wavlm: Large-scale self-supervised pre-training for full stack speech processing. IEEE Journal of Selected Topics in Signal Processing, 16(6):1505–1518.
- Evaluation of voice-based data entry to an electronic health record system for dentistry. Biocybernetics and Biomedical Engineering, 33(4):204–210.
- Unsupervised cross-lingual representation learning for speech recognition. INTERSPEECH.
- Best of both worlds: Robust accented speech recognition with adversarial transfer learning. arXiv preprint arXiv:2103.05834.
- A smartphone-based asr data collection tool for under-resourced languages. Speech communication, 56:119–131.
- Performance disparities between accents in automatic speech recognition. arXiv preprint arXiv:2208.01157.
- Afrolm: A self-active learning-based multilingual pretrained language model for 23 african languages. arXiv preprint arXiv:2211.03263.
- Ethnologue: Languages of the World, 22nd Edition.
- From dictations to clinical reports using machine translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers), pages 121–128.
- Tanzanian hospital adopts voice recognition technology - it news africa - up to date technology news, it news, digital news, telecom news, mobile news, gadgets news, analysis and reports.
- Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059. PMLR.
- Deep bayesian active learning with image data. In International Conference on Machine Learning, pages 1183–1192. PMLR.
- Collecting resources in sub-saharan african languages for automatic speech recognition: a case study of wolof. In 10th Language Resources and Evaluation Conference (LREC 2016).
- A clinician survey of using speech recognition for clinical documentation in the electronic health record. International journal of medical informatics, 130:103938.
- Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100.
- Statistical method of building dialect language models for asr systems. In Proceedings of COLING 2012, pages 1179–1194.
- Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:3451–3460.
- Automated speech recognition in adult stroke survivors: Comparing human and computer transcriptions. Folia Phoniatrica et Logopaedica, 71(5-6):286–296.
- Biological sequence design with gflownets. In International Conference on Machine Learning, pages 9786–9801. PMLR.
- Deup: Direct epistemic uncertainty prediction. arXiv preprint arXiv:2102.08501.
- Herman Kamper and Thomas Niesler. 2011. Multi-accent speech recognition of afrikaans, black and white varieties of south african english. In Twelfth Annual Conference of the International Speech Communication Association.
- Accuracy of speech recognition system’s medical report and physicians’ experience in hospitals. Frontiers in Health Informatics, 8(1):19.
- The health worker shortage in africa: are enough physicians and nurses being trained?
- A systematic comparison of contemporary automatic speech recognition engines for conversational clinical speech. In AMIA Annual Symposium Proceedings, volume 2018, page 683. American Medical Informatics Association.
- Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences, 117(14):7684–7689.
- Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems, 30.
- Speech technology for healthcare: Opportunities, challenges, and state of the art. IEEE Reviews in Biomedical Engineering, 14:342–356.
- Implementation of stomatological hospital information. In International conference on Big Data Analytics for Cyber-Physical-Systems, pages 602–609. Springer.
- Adaptive multi-corpora language model training for speech recognition. arXiv preprint arXiv:2211.05121.
- Tarisai Kudakwashe Manyati and Morgen Mutsau. 2021. A systematic review of the factors that hinder the scale up of mobile health technologies in antenatal care programmes in sub-saharan africa. African Journal of Science, Technology, Innovation and Development, 13(1):125–131.
- Awezamed: A multilingual, multimodal speech-to-speech translation application for maternal health care. In 2020 IEEE 23rd International Conference on Information Fusion (FUSION), pages 1–8.
- “i don’t think these devices are very culturally sensitive.”—impact of automated speech recognition errors on african americans. Frontiers in Artificial Intelligence, page 169.
- Effect of a voice recognition system on pediatric outpatient medication errors at a tertiary healthcare facility in kenya. Therapeutic Advances in Drug Safety, 9(9):499–508.
- Assessing the accuracy of automatic speech recognition for psychotherapy. NPJ digital medicine, 3(1):1–8.
- Ghulam Muhammad. 2015. Automatic speech recognition using interlaced derivative pattern for cloud based healthcare system. Cluster Computing, 18(2):795–802.
- Shortage of healthcare workers in sub-saharan africa: a nephrological perspective. Clinical nephrology, 74:S129–33.
- Shortage of healthcare workers in developing countries–africa. Ethnicity & disease, 19(1):60.
- Stakeholders’ perceptions on shortage of healthcare workers in primary healthcare in botswana: focus group discussions. PloS one, 10(8):e0135846.
- Integration of mhealth information and communication technologies into the clinical settings of hospitals in sub-saharan africa: Qualitative study. JMIR mHealth and uHealth, 9(10):e26358.
- Afrispeech-200: Pan-african accented speech dataset for clinical and general domain asr.
- Identifying key challenges facing healthcare systems in africa and potential solutions. International journal of general medicine, 12:395.
- Opportunities and challenges of automatic speech recognition systems for low-resource language speakers. In CHI Conference on Human Factors in Computing Systems, pages 1–17.
- A survey of deep active learning. ACM computing surveys (CSUR), 54(9):1–40.
- Using machine learning techniques to reduce data annotation time. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, volume 50, pages 2438–2442. SAGE Publications Sage CA: Los Angeles, CA.
- Automatic speech recognition in neurodegenerative disease. International Journal of Speech Technology, 24(3):771–779.
- “hey siri, help me take care of my child”: A feasibility study with caregivers of children with special healthcare needs (cshcn) using voice interaction and automatic speech recognition in remote care management. Frontiers in public health, page 366.
- Evaluation of off-the-shelf speech recognizers on different accents in a dialogue domain. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6001–6008.
- Yulia Tsvetkov. 2017. Opportunities and challenges in working with low-resource languages. In Carnegie Mellon Univ., Language Technologies Institute.
- Analysis of documentation speed using web-based medical speech recognition technology: randomized controlled trial. Journal of medical Internet research, 17(11):e5072.
- Patrice Yemmene and Laurent Besacier. 2019. Motivations, challenges, and perspectives for the development of an automatic speech recognition system for the under-resourced ngiemboon language. In Proceedings of The First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) co-located with ICNLSP 2019-Short Papers, pages 59–67.
- Julián Zapata and Andreas Søeborg Kirkedal. 2015. Assessing the performance of automatic speech recognition systems when used by native and non-native speakers of three major languages in dictation workflows. In Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015), pages 201–210.
- A review on automatic image annotation techniques. Pattern Recognition, 45(1):346–362.
- Speech recognition in alzheimer’s disease and in its assessment. In Interspeech, volume 2016, pages 1948–1952.