Where Are You From? Let Me Guess! Subdialect Recognition of Speeches in Sorani Kurdish (2404.00124v1)
Abstract: Classifying Sorani Kurdish subdialects poses a challenge due to the need for publicly available datasets or reliable resources like social media or websites for data collection. We conducted field visits to various cities and villages to address this issue, connecting with native speakers from different age groups, genders, academic backgrounds, and professions. We recorded their voices while engaging in conversations covering diverse topics such as lifestyle, background history, hobbies, interests, vacations, and life lessons. The target area of the research was the Kurdistan Region of Iraq. As a result, we accumulated 29 hours, 16 minutes, and 40 seconds of audio recordings from 107 interviews, constituting an unbalanced dataset encompassing six subdialects. Subsequently, we adapted three deep learning models: ANN, CNN, and RNN-LSTM. We explored various configurations, including different track durations, dataset splitting, and imbalanced dataset handling techniques such as oversampling and undersampling. Two hundred and twenty-five(225) experiments were conducted, and the outcomes were evaluated. The results indicated that the RNN-LSTM outperforms the other methods by achieving an accuracy of 96%. CNN achieved an accuracy of 93%, and ANN 75%. All three models demonstrated improved performance when applied to balanced datasets, primarily when we followed the oversampling approach. Future studies can explore additional future research directions to include other Kurdish dialects.
- (2016). Arabic language weka-based dialect classifier for Arabic automatic speech recognition transcripts. In Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), pages 204–211.
- (2021). Kurdish spoken dialect recognition using x-vector speaker embedding. In International Conference on Speech and Computer, pages 50–57. Cham, Switzerland: Springer.
- (2010). Speaker independent Urdu speech recognition using HMM. In Natural Language Processing and Information Systems, volume 6177, pages 140–148, Berlin, Heidelberg. Springer.
- (2018). Low-resource speech-to-text translation. arXiv preprint arXiv:1803.09164.
- (2021). Music genre classification techniques. In International Journal of Engineering Research & Technology, volume 10, pages 158–161.
- (2020). The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics, 21(1):1–13.
- (2015). Attention-based models for speech recognition. Advances in Neural Information Processing Systems, 28.
- Cockrell-Abdullah, A. (2018). There is no kurdish art. The Journal of Intersectionality, 2(2):103–128. Available at: https://www.jstor.org/stable/10.13169/jinte.2.2.0103 [Accessed: 2023-01-12].
- (2019). Improving large vocabulary Urdu speech recognition system using deep neural networks. In Interspeech, pages 2978–2982.
- (2004). Applications of support vector machines to speech recognition. In IEEE Transactions on Signal Processing, volume 52, pages 2348–2355. IEEE.
- Garbade, M. J. (2021). What is google colab? https://educationecosystem.com/blog/what-is-google-colab/, January 15. [Accessed: June 22, 2023].
- (2016). A comparison of monocular and stereo visual fastslam implementations. In 2016 IEEE Metrology for Aerospace (MetroAeroSpace), pages 227–232.
- (2020). Turkish dialect recognition using acoustic and phonotactic features in deep learning architectures. In Bilişim Teknolojileri Dergisi, volume 13, pages 207–216. Gazi University.
- Hama Khorshid, F. (2018). Zimanî kurdî û diyalêktekanî, twêjîneweyekî cugrafî. Hewler: Rojhellat Printing Press. [in Kurdish].
- (2020). Spoken Arabic dialect recognition using x-vectors. In Natural Language Engineering, volume 26, pages 691–700. Cambridge University Press.
- (2016). Automatic Kurdish dialects identification. Computer Science & Information Technology, 6:61–78.
- Hassanpour, A. (1992). Nationalism and language in Kurdistan, 1918-1985. San Francisco: Mellen Research University Press.
- (2013). An empirical study of oversampling and undersampling for instance selection methods on imbalance datasets. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications: 18th Iberoamerican Congress, CIARP 2013, Havana, Cuba, November 20-23, 2013, Proceedings, Part I 18, pages 262–269. Springer.
- Hussein, S. A. (2011). şêwezarekanî şarî kerkuk. Hewler: Ministry of Culture and Youth Priniting Press. [in Kurdish].
- Izady, M. (2015). Kurds: A concise handbook. Taylor & Francis.
- (2019). A critical analysis of space in language variation and change: some examples of english and kurdish languages. Twezhar.
- Khorshid, F. H. (1983). Kurdish Language and the geographical distribution of its dialects. Ishbeelia Press.
- (2018). Multi-dialect speech recognition with a single sequence-to-sequence model. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4749–4753. IEEE.
- Murhaf, F. (2013). Erg tokenization and lexical categorization: A sequence labeling approach. Master’s thesis.
- (2015). The Jira repository dataset: Understanding social aspects of software development. In Proceedings of the 11th International Conference on Predictive Models and Data Analytics in Software Engineering, pages 1–4.
- (2004). Text-independent speaker verification based on relation of mfcc components. In 2004 International Symposium on Chinese Spoken Language Processing, pages 57–60. IEEE.
- Rahmani, W. (2009). Kurdistan û kurd le rwangeyi nexşewaniyewe. Hewler: Rozhihalat Printing Press. [in Kurdish].
- (2009). Discriminative n-gram selection for dialect recognition. In Tenth Annual Conference of the International Speech Communication Association.
- (2018). Fine-grained Arabic dialect identification. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1332–1344.
- Shafieian, M. (2022). Hidden Markov model and Persian speech recognition. International Journal of Nonlinear Analysis and Applications.
- Soane, E. B. (1912). XXIV. notes on a Kurdish dialect, Sulaimania (southern Turkish Kurdistan). In Journal of the Royal Asiatic Society, volume 44, page 891–940. Cambridge University Press.
- (2020). Deep learning-based stock price prediction using LSTM and bi-directional LSTM model. In 2020 2nd Novel Intelligent and Leading Emerging Sciences Conference (NILES), pages 87–92. IEEE.
- (2020). Application of the residue number system to reduce hardware costs of the convolutional neural network implementation. In Mathematics and Computers in Simulation, volume 177, pages 232–243. Elsevier.
- (2020). Persian speech recognition using deep learning. International Journal of Speech Technology, 23(4):893–905.
- (2022). Jira: A central Kurdish speech recognition system, designing and building speech corpus and pronunciation lexicon. In Language Resources and Evaluation, volume 56, pages 917–941. Springer.
- (2019). A multi-purpose and large-scale speech corpus in Persian and English for speaker and speech recognition: the DeepMine database. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 397–402. IEEE.
- (2017). Dialect recognition based on unsupervised bottleneck features. In Interspeech, pages 2576–2580.
- (1990). Parallel distributed processing model with local space-invariant interconnections and its optical architecture. In Applied Optics, volume 29, pages 4790–4797. Optica Publishing Group.
- (2016). A unified approach for Arabic language dialect detection. In Twenty Ninth International Conference on Computers Applications in Industry and Engineering (CAINE), pages 165–170.
- (2021). Comparing the accuracy of deep neural networks (dnn) and convolutional neural network (cnn) in music genre recognition (mgr): experiments on kurdish music. arXiv preprint arXiv:2111.11063.