Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DarijaBanking: A New Resource for Overcoming Language Barriers in Banking Intent Detection for Moroccan Arabic Speakers (2405.16482v1)

Published 26 May 2024 in cs.CL

Abstract: Navigating the complexities of language diversity is a central challenge in developing robust natural language processing systems, especially in specialized domains like banking. The Moroccan Dialect (Darija) serves as the common language that blends cultural complexities, historical impacts, and regional differences. The complexities of Darija present a special set of challenges for LLMs, as it differs from Modern Standard Arabic with strong influence from French, Spanish, and Tamazight, it requires a specific approach for effective communication. To tackle these challenges, this paper introduces \textbf{DarijaBanking}, a novel Darija dataset aimed at enhancing intent classification in the banking domain, addressing the critical need for automatic banking systems (e.g., chatbots) that communicate in the native language of Moroccan clients. DarijaBanking comprises over 1,800 parallel high-quality queries in Darija, Modern Standard Arabic (MSA), English, and French, organized into 24 intent classes. We experimented with various intent classification methods, including full fine-tuning of monolingual and multilingual models, zero-shot learning, retrieval-based approaches, and LLM prompting. One of the main contributions of this work is BERTouch, our BERT-based LLM for intent classification in Darija. BERTouch achieved F1-scores of 0.98 for Darija and 0.96 for MSA on DarijaBanking, outperforming the state-of-the-art alternatives including GPT-4 showcasing its effectiveness in the targeted application.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Ahmed et al. Abdelali. 2021. Pre-training bert on arabic tweets: Practical considerations. arXiv preprint arXiv:2102.10684.
  2. Muhammad et al. Abdul-Mageed. 2021. ARBERT & MARBERT: Deep bidirectional transformers for Arabic. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7088–7105, Online. Association for Computational Linguistics.
  3. Eleni et al. Adamopoulou. 2020. An overview of chatbot technology. In Artificial Intelligence Applications and Innovations, pages 373–383, Cham. Springer International Publishing.
  4. Arfan et al. Ahmed. 2022. Arabic chatbot technologies: A scoping review. Computer Methods and Programs in Biomedicine Update, 2:100057.
  5. Bushra et al. Algotiml. 2019. Arabic tweet-act: Speech act recognition for arabic asynchronous conversations. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 183–191, Florence, Italy. Association for Computational Linguistics.
  6. Wissamet al. Antoun. 2020. AraBERT: Transformer-based model for Arabic language understanding. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pages 9–15, Marseille, France. European Language Resource Association.
  7. Samyadeep et al. Basu. 2022. Strategies to improve few-shot learning for intent classification and slot-filling. In Proceedings of the Workshop on Structured and Unstructured Knowledge Integration (SUKI), pages 17–25. Association for Computational Linguistics.
  8. Chiva Olivia et al. Bilah. 2022. Intent detection on indonesian text using convolutional neural network. In 2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom), pages 174–178.
  9. ElMehdi et al. Boujou. 2021. An open access nlp dataset for arabic dialects: Data collection, labeling, and model construction. arXiv preprint arXiv:2102.11000.
  10. Iñigo et al. Casanueva. 2020. Efficient intent detection with dual sentence encoders. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI, pages 38–45, Online. Association for Computational Linguistics.
  11. Alexis et al. Conneau. 2020. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
  12. Kareem et al. Darwish. 2021. A panoramic survey of natural language processing in the arab worlds. Commun. ACM, 64(4):72–81.
  13. Jacob et al. Devlin. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  14. Abdellah et al. El Mekki. 2021. Bert-based multi-task model for country and province level msa and dialectal arabic identification. In Proceedings of the sixth Arabic natural language processing workshop, pages 271–275.
  15. Abdellah et al. El Mekki. 2022. Adasl: an unsupervised domain adaptation framework for arabic multi-dialectal sequence labeling. Information Processing & Management, 59(4):102964.
  16. AbdelRahim et al. Elmadany. 2018. Arsas: An arabic speech-act and sentiment corpus of tweets. OSACT, 3:20.
  17. Kabil et al. Essefar. 2023. Omcd: Offensive moroccan comments dataset. Language Resources and Evaluation, pages 1–21.
  18. Liang Wang et al. 2022. Text embeddings by weakly-supervised contrastive pre-training. ArXiv, abs/2212.03533.
  19. Mikel Artetxe et al. 2019. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. Transactions of the Association for Computational Linguistics, 7:597–610.
  20. Neha Sengupta et al. 2023a. Jais and jais-chat: Arabic-centric foundation and instruction-tuned open generative large language models.
  21. Yingzi Xu et al. 2020. Ai customer service: Task complexity, problem-solving ability, and usage intention. Australasian Marketing Journal (AMJ), 28(4):189–199.
  22. Zhiheng Xi et al. 2023b. The rise and potential of large language model based agents: A survey.
  23. Fangxiaoyu et al. Feng. 2020. Language-agnostic bert sentence embedding.
  24. Ahlam et al. Fuad. 2022. Recent developments in arabic conversational ai: A literature review. IEEE Access, 10:23842–23859.
  25. Mohammad et al. Hijjawi. 2013. User’s utterance classification using machine learning for arabic conversational agents. In 2013 5th International Conference on Computer Science and Information Technology, pages 223–232.
  26. Mohammad et al. Hijjawi. 2014. Arabchat: An arabic conversational agent. In 2014 6th International Conference on Computer Science and Information Technology (CSIT), pages 227–237.
  27. Go et al. Inoue. 2021. The interplay of variant, size, and task type in arabic pre-trained language models. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine (Online). Association for Computational Linguistics.
  28. ArBanking77: Intent detection neural model and a new dataset in modern and dialectical Arabic. In Proceedings of ArabicNLP 2023, pages 276–287, Singapore (Hybrid). Association for Computational Linguistics.
  29. Alaa et al. Joukhadar. 2019. Arabic dialogue act recognition for textual chatbot systems. In Proceedings of The First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) colocated with ICNLSP 2019-Short Papers, pages 43–49.
  30. Brijesh Lakkad. 2018. smart-banking-chatbot. https://github.com/Brijeshlakkad/smart-banking-chatbot.
  31. Patrick et al. Lewis. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA. Curran Associates Inc.
  32. Yinhan et al. Liu. 2019. Roberta: A robustly optimized bert pretraining approach.
  33. Sanad et al. Malaysha. 2024. AraFinNlp 2024: The first arabic financial nlp shared task. In Proceedings of the 2nd Arabic Natural Language Processing Conference (Arabic-NLP), Part of the ACL 2024. Association for Computational Linguistics.
  34. Ridha et al. Mezzi. 2022. Mental health intent recognition for arabic-speaking patients using the mini international neuropsychiatric interview (mini) and bert model. Sensors, 22(3).
  35. El Moatez Billah et al. Nagoudi. 2022. Turjuman: A public toolkit for neural arabic machine translation. In Proceedings of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT5), Marseille, France. European Language Resource Association.
  36. Eman et al. Naser-Karajah. 2021. Current trends and approaches in synonyms extraction: Potential adaptation to arabic. In Proceedings of the 2021 International Conference on Information Technology (ICIT), pages 428–434, Amman, Jordan. IEEE.
  37. OpenAI. 2022. Introducing chatgpt. Accessed: 2023-03-13.
  38. OpenAI. 2023a. Gpt-4 technical report. arXiv.
  39. OpenAI. 2023b. New embedding models and api updates.
  40. Jay Patel. 2017. banking-faq-bot. https://github.com/MrJay10/banking-faq-bot.
  41. Nils et al. Reimers. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks.
  42. Nils et al. Reimers. 2020a. Making monolingual sentence embeddings multilingual using knowledge distillation.
  43. Nils et al. Reimers. 2020b. sentence-transformers/distiluse-base-multilingual-cased-v1.
  44. Nils et al. Reimers. 2020c. sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2.
  45. Nils et al. Reimers. 2020d. sentence-transformers/paraphrase-multilingual-mpnet-base-v2.
  46. Loïc Barrault et al. Seamless Communication. 2023. Seamlessm4t—massively multilingual & multimodal machine translation. ArXiv.
  47. Sana et al. Shams. 2019. Lexical intent recognition in urdu queries using deep neural networks. In Advances in Soft Computing, pages 39–50, Cham. Springer International Publishing.
  48. Sana et al. Shams. 2022. Improving user intent detection in urdu web queries with capsule net architectures. Applied Sciences, 12(22).
  49. Yewei et al. Song. 2023. Letz translate: Low-resource machine translation for luxembourgish. In 2023 5th International Conference on Natural Language Processing (ICNLP), pages 165–170. IEEE.
  50. Jörg Tiedemann and Santhosh Thottingal. 2020. OPUS-MT — Building open translation services for the World. In Proceedings of the 22nd Annual Conferenec of the European Association for Machine Translation (EAMT), Lisbon, Portugal.
  51. Yunhua et al. Zhou. 2022. Knn-contrastive learning for out-of-domain intent classification. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5129–5141. Association for Computational Linguistics.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets