Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automatic Speech Recognition Advancements for Indigenous Languages of the Americas (2404.08368v3)

Published 12 Apr 2024 in cs.CL

Abstract: Indigenous languages are a fundamental legacy in the development of human communication, embodying the unique identity and culture of local communities in America. The Second AmericasNLP (Americas Natural Language Processing) Competition Track 1 of NeurIPS (Neural Information Processing Systems) 2022 proposed the task of training automatic speech recognition (ASR) systems for five Indigenous languages: Quechua, Guarani, Bribri, Kotiria, and Wa'ikhana. In this paper, we describe the fine-tuning of a state-of-the-art ASR model for each target language, using approximately 36.65 h of transcribed speech data from diverse sources enriched with data augmentation methods. We systematically investigate, using a Bayesian search, the impact of the different hyperparameters on the Wav2vec2.0 XLS-R (Cross-Lingual Speech Representations) variants of 300 M and 1 B parameters. Our findings indicate that data and detailed hyperparameter tuning significantly affect ASR accuracy, but language complexity determines the final result. The Quechua model achieved the lowest character error rate (CER) (12.14), while the Kotiria model, despite having the most extensive dataset during the fine-tuning phase, showed the highest CER (36.59). Conversely, with the smallest dataset, the Guarani model achieved a CER of 15.59, while Bribri and Wa'ikhana obtained, respectively, CERs of 34.70 and 35.23. Additionally, Sobol' sensitivity analysis highlighted the crucial roles of freeze fine-tuning updates and dropout rates. We release our best models for each language, marking the first open ASR models for Wa'ikhana and Kotiria. This work opens avenues for future research to advance ASR techniques in preserving minority Indigenous languages

Definition Search Book Streamline Icon: https://streamlinehq.com
References (93)
  1. Characterizing the indigenous forest peoples of latin america: Results from census data. World Development, 125:104685, 2020.
  2. UNESCO. How can latin american and caribbean indigenous languages be preserved?, 2021. Last accessed 02 July 2023.
  3. Norman A McQuown. The indigenous languages of latin america. American Anthropologist, 57(3):501–570, 1955.
  4. Language is land, land is language: The importance of indigenous languages. Human Geography, 15(2):206–210, 2022.
  5. UNESCO. Indigenous languages: gateways to the world, 2022. Last accessed 02 July 2023.
  6. Global predictors of language endangerment and the future of linguistic diversity. Nature Ecology & Evolution, 6:163–173, 2020.
  7. Indigenous sustainable relations: considering land in language and language in land. Current Opinion in Environmental Sustainability, 43:1–7, 2020. Indigenous Conceptualizations of ‘Sustainability’.
  8. La modelización de la morfología verbal bribri, 20223. Last accessed 12 August 2023.
  9. Adolfo Constenla Umaña. Chibchan languages, pages 391–440. De Gruyter Mouton, Berlin, Boston, 2012.
  10. Neural machine translation models with back-translation for the extremely low-resource indigenous language Bribri. In Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, December 2020.
  11. Americasnli: Machine translation and natural language inference systems for indigenous languages of the americas. Frontiers in Artificial Intelligence, 5:266, 2022.
  12. W.F.H. Adelaar. Guaraní. In Keith Brown, editor, Encyclopedia of Language & Linguistics (Second Edition), pages 165–166. Elsevier, Oxford, second edition edition, 2006.
  13. William Costa. 'culture is language': why an indigenous tongue is thriving in paraguay, 2020. Last accessed 10 July 2023.
  14. K. Stenzel. Kotiria 'differential object marking' in cross-linguistic perspective. Amerindia, 32:153–181, 2008.
  15. Endangered language project. Endangered language project catalogue, 2023. Last accessed 12 July 2023.
  16. Mily Crevels. Language endangerment in South America: The clock is ticking, pages 167–234. De Gruyter Mouton, Berlin, Boston, 2012.
  17. Ethnologue. Languages of the world, 2023. Last accessed 12 July 2023.
  18. UNESCO. World atlas of languages, 2023. Last accessed 12 July 2023.
  19. ``Mining the Data'' on the Huancayo-Huancavelica Quechua Frontier, pages 87–109. Palgrave Macmillan US, 2011.
  20. Mapudungun according to its speakers: Mapuche intellectuals and the influence of standard language ideology. Current Issues in Language Planning, 14:105–118, 2013.
  21. Audio augmentation for speech recognition. In Sixteenth annual conference of the international speech communication association, 2015.
  22. A study on data augmentation of reverberant speech for robust speech recognition. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5220–5224. IEEE, 2017.
  23. Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779, 2019.
  24. End-to-end asr: from supervised to semi-supervised learning with modern architectures. arXiv preprint arXiv:1911.08460, 2019.
  25. Iterative pseudo-labeling for speech recognition. arXiv preprint arXiv:2005.09267, 2020.
  26. wav2vec 2.0: A framework for self-supervised learning of speech representations. CoRR, abs/2006.11477, 2020.
  27. Applying wav2vec2.0 to speech recognition in various low-resource languages. CoRR, abs/2012.12121, 2020.
  28. Unsupervised speech recognition. Advances in Neural Information Processing Systems, 34:27826–27839, 2021.
  29. Transfer learning for speech and language processing. In 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pages 1225–1237. IEEE, 2015.
  30. Transfer learning for speech recognition on a budget. arXiv preprint arXiv:1706.00290, 2017.
  31. Language-adversarial transfer learning for low-resource speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(3):621–630, 2018.
  32. Findings of the second americasnlp competition on speech-to-text translation. In NeurIPS 2022 Competition Track, pages 217–232. PMLR, 2022.
  33. Challenges of language technologies for the indigenous languages of the americas, 2018.
  34. Findings of the americasnlp 2021 shared task on open machine translation for indigenous languages of the americas. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pages 202–217, 2021.
  35. Marvin Matías Agüero Torales et al. Machine learning approaches for topic and sentiment analysis in multilingual opinions and low-resource languages: From english to guarani. 2022.
  36. Mike Gasser. Machine translation and the future of indigenous languages. In I Congreso Internacional de Lenguas y Literaturas Indoamericanas, 2006.
  37. An (unhelpful) guide to selecting the best ASR architecture for your under-resourced language. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1008–1016, Toronto, Canada, July 2023. Association for Computational Linguistics.
  38. Robust automatic continuous speech recognition for'adi', a zero-resource indigenous language of arunachal pradesh. Sādhanā, 47(4):271, 2022.
  39. Development of automatic speech recognition for the documentation of cook islands māori. 2022.
  40. Isolated automatic speech recognition of quechua numbers using mfcc, dtw and knn. Int. J. Adv. Comput. Sci. Appl, 9(10):24–29, 2018.
  41. Massively multilingual adversarial speech recognition. arXiv preprint arXiv:1904.02210, 2019.
  42. Automatic speech recognition of quechua language using hmm toolkit. In Annual international symposium on information management and big data, pages 61–68. Springer, 2019.
  43. Data augmentation for low-resource quechua asr improvement, 2022.
  44. Eñe’  e: Sistema de reconocimiento automático del habla en guaraní. In Simposio Argentino de Inteligencia Artificial (ASAI 2016)-JAIIO 45 (Tres de Febrero, 2016)., 2016.
  45. Openasr20: An open challenge for automatic speech recognition of conversational telephone speech in low-resource languages. In Interspeech, pages 4324–4328, 2021.
  46. Openasr21: The second open challenge for automatic speech recognition of low-resource languages. Proc. Interspeech 2022, pages 4895–4899, 2022.
  47. The tnt team system descriptions for iarpa openasr20.
  48. Improving automatic speech recognition performance for low-resource languages with self-supervised models. IEEE Journal of Selected Topics in Signal Processing, 16(6):1227–1241, 2022.
  49. Alineación forzada sin entrenamiento para la anotación automática de corpus orales de las lenguas indígenas de costa rica. Káñina, 40(4):175–199, 2017.
  50. Rolando Coto-Solano. Explicit tone transcription improves ASR performance in extremely low-resource languages: A case study in Bribri. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, pages 173–184, Online, June 2021. Association for Computational Linguistics.
  51. Evaluating self-supervised speech representations for indigenous american languages. arXiv preprint arXiv:2310.03639, 2023.
  52. Rolando Coto-Solano. Evaluating word embeddings in extremely under-resourced languages: A case study in Bribri. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4455–4467, Gyeongju, Republic of Korea, October 2022. International Committee on Computational Linguistics.
  53. Automatic speech recognition for supporting endangered language documentation, 2021.
  54. Olga Krasnoukhova. Attributive modification in south american indigenous languages. Linguistics, 60(3):745–807, 2022.
  55. Unsupervised cross-lingual representation learning at scale. CoRR, abs/1911.02116, 2019.
  56. Transfer ability of monolingual wav2vec2.0 for low-resource speech recognition. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–6, 2021.
  57. Using Large Self-Supervised Models for Low-Resource Speech Recognition. In Proc. Interspeech 2021, pages 2436–2440, 2021.
  58. Improving aphasic speech recognition by using novel semi-supervised learning methods on aphasiabank for english and spanish. Applied Sciences, 11(19):8872, 2021.
  59. A new benchmark of aphasia speech recognition and detection based on e-branchformer and multi-task learning, 2023.
  60. Robust speech recognition via large-scale weak supervision, 2022.
  61. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33:12449–12460, 2020.
  62. Xls-r: Self-supervised cross-lingual speech representation learning at scale, 2021.
  63. Common voice: A massively-multilingual speech corpus. arXiv preprint arXiv:1912.06670, 2019.
  64. Speech recognition and keyword spotting for low-resource languages: Babel project research at cued. In Fourth International workshop on spoken language technologies for under-resourced languages (SLTU-2014), pages 16–23. International Speech Communication Association (ISCA), 2014.
  65. Mls: A large-scale multilingual dataset for speech research. arXiv preprint arXiv:2012.03411, 2020.
  66. VoxPopuli: A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 993–1003, Online, August 2021. Association for Computational Linguistics.
  67. Voxlingua107: a dataset for spoken language recognition, 2020.
  68. Corpus oral pandialectal de la lengua bribri. http://bribri.net, 2017.
  69. Grammar and multilingual practices through the lens of everyday interaction in two endangered languages in the east tukano family. http://hdl.handle.net/2196/00-0000-0000-0010-7D1A-A, 2017.
  70. Kotiria linguistic and cultural archive. endangered languages archive. http://hdl.handle.net/2196/00-0000-0000-0002-05B0-5, 2007.
  71. Wa’ikhana linguistic and cultural archive. endangered languages archiv. http://hdl.handle.net/2196/00-0000-0000-000E-692D-2, 2007.
  72. Siminchikkunarayku. https://www.siminchikkunarayku.pe/.
  73. Universidad de Costa Rica. Portal de la lengua bribri se'ie), 2021. Last accessed 12 September 2023.
  74. live.bible.is. Last accessed 12 September 2023.
  75. Data from quipu project (12-2018), 2020. Last accessed 12 September 2023.
  76. Audio augmentation for speech recognition. In Proc. Interspeech 2015, pages 3586–3589, 2015.
  77. Kenneth Heafield. Kenlm: Faster and smaller language model queries. In Proceedings of the sixth workshop on statistical machine translation, pages 187–197, 2011.
  78. Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1):148–175, 2015.
  79. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019.
  80. Ilya M Sobol. Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates. Mathematics and computers in simulation, 55(1-3):271–280, 2001.
  81. Characterization of the covid-19 pandemic and the impact of uncertainties, mitigation strategies, and underreporting of cases in south korea, italy, and brazil. Chaos, Solitons & Fractals, 136:109888, 2020.
  82. Toward economical application of carbon capture and utilization technology with near-zero carbon emission. Nature Communications, 13(1):7482, 2022.
  83. Impact of xylella fastidiosa subspecies pauca in european olives. Proceedings of the National Academy of Sciences, 117(17):9250–9259, 2020.
  84. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2020.
  85. Random forests for global sensitivity analysis: A selective review. Reliability Engineering & System Safety, 206:107312, 2021.
  86. Sobol sensitivity analysis and multi-objective optimization of manifold microchannel heat sink considering entropy generation minimization. International Journal of Heat and Mass Transfer, 208:124046, 2023.
  87. I.M Sobol. Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates. Mathematics and Computers in Simulation, 55(1):271–280, 2001. The Second IMACS Seminar on Monte Carlo Methods.
  88. AmericasNLP Organizers. Data for fine-tuning in americasnlp 2022. https://github.com/AmericasNLP/americasnlp2022/tree/master, 2022.
  89. Monica Romero. Asr indigenous language quechua, 2023.
  90. Monica Romero. Asr indigenous language bribri, 2023.
  91. Monica Romero. Asr indigenous language kotiria, 2023.
  92. Monica Romero. Asr indigenous language guarani, 2023.
  93. Monica Romero. Asr indigenous language wa'ikhana, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Monica Romero (4 papers)
  2. Sandra Gomez (1 paper)
  3. Ivan G. Torre (2 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets