Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models (2306.16322v1)

Published 28 Jun 2023 in cs.CL

Abstract: LLMs have demonstrated impressive performance on various downstream tasks without requiring fine-tuning, including ChatGPT, a chat-based model built on top of LLMs such as GPT-3.5 and GPT-4. Despite having a lower training proportion compared to English, these models also exhibit remarkable capabilities in other languages. In this study, we assess the performance of GPT-3.5 and GPT-4 models on seven distinct Arabic NLP tasks: sentiment analysis, translation, transliteration, paraphrasing, part of speech tagging, summarization, and diacritization. Our findings reveal that GPT-4 outperforms GPT-3.5 on five out of the seven tasks. Furthermore, we conduct an extensive analysis of the sentiment analysis task, providing insights into how LLMs achieve exceptional results on a challenging dialectal dataset. Additionally, we introduce a new Python interface https://github.com/ARBML/Taqyim that facilitates the evaluation of these tasks effortlessly.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Benchmarking arabic ai with large language models. arXiv preprint arXiv:2305.14982.
  2. Arbert & marbert: deep bidirectional transformers for arabic. arXiv preprint arXiv:2101.01785.
  3. Aramus: Pushing the limits of data and model scale for arabic natural language processing. arXiv preprint arXiv:2306.06800.
  4. Building arabic paraphrasing benchmark based on transformation rules. pages 1–17.
  5. Deep diacritization: Efficient hierarchical recurrence for improved Arabic diacritization. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, pages 38–48, Barcelona, Spain (Online). Association for Computational Linguistics.
  6. Arabic tweets sentimental analysis using machine learning. In Advances in Artificial Intelligence: From Theory to Practice: 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2017, Arras, France, June 27-30, 2017, Proceedings, Part I 30, pages 602–610. Springer.
  7. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023.
  8. Transliteration of Arabizi into Arabic orthography: Developing a parallel annotated Arabizi-Arabic script SMS/chat corpus. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), pages 93–103, Doha, Qatar. Association for Computational Linguistics.
  9. On the opportunities and risks of foundation models. ArXiv, abs/2108.07258.
  10. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  11. Evaluating the feasibility of chatgpt in healthcare: An analysis of multiple clinical and research scenarios. Journal of Medical Systems, 47.
  12. Chatgpt evaluation on sentence level relations: A focus on temporal, causal, and discourse relations. arXiv preprint arXiv:2304.14827.
  13. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  14. Arabic diacritization: Stats, rules, and hacks. In Proceedings of the Third Arabic Natural Language Processing Workshop, pages 9–17, Valencia, Spain. Association for Computational Linguistics.
  15. Davis, E. (2023). Mathematics, word problems, common sense, and artificial intelligence. arXiv preprint arXiv:2301.09723.
  16. Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv, abs/1810.04805.
  17. Using mechanical turk to create a corpus of arabic summaries.
  18. Arat5: Text-to-text transformers for arabic language generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 628–647.
  19. Orca: A challenging benchmark for arabic language understanding. ArXiv, abs/2212.10758.
  20. Mathematical capabilities of chatgpt. arXiv preprint arXiv:2301.13867.
  21. Revisiting pre-trained language models and their evaluation for arabic natural language processing. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3135–3151.
  22. How well does chatgpt do when taking the medical licensing exams? the implications of large language models for medical education and knowledge assessment. medRxiv, pages 2022–12.
  23. How close is chatgpt to human experts? comparison corpus, evaluation, and detection. arXiv preprint arXiv:2301.07597.
  24. How good are gpt models at machine translation? a comprehensive evaluation.
  25. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556.
  26. Is chatgpt a good translator? yes with gpt-4 as the engine. arXiv preprint arXiv:2301.08745.
  27. 75 languages, 1 model: Parsing universal dependencies universally. arXiv preprint arXiv:1904.02099.
  28. Wikilingua: A new benchmark dataset for cross-lingual abstractive summarization. arXiv preprint arXiv:2010.03093.
  29. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
  30. Summary of chatgpt/gpt-4 research and perspective towards the future of large language models.
  31. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  32. Trip: Triangular document-level pre-training for multilingual language models. arXiv preprint arXiv:2212.07752.
  33. Highly effective Arabic diacritization using sequence to sequence modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2390–2395, Minneapolis, Minnesota. Association for Computational Linguistics.
  34. Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786.
  35. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana. Association for Computational Linguistics.
  36. Is chatgpt a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476.
  37. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  38. Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
  39. A unified model for arabizi detection and transliteration using sequence-to-sequence models. In Proceedings of the fifth arabic natural language processing workshop, pages 167–177.
  40. Prague arabic dependency treebank: A word on the million words. In Proceedings of the workshop on Arabic and local languages (LREC 2008), pages 16–23.
  41. Gptaraeval: A comprehensive evaluation of chatgpt on arabic nlp. arXiv e-prints, pages arXiv–2305.
  42. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  43. What language model architecture and pretraining objective works best for zero-shot generalization? In International Conference on Machine Learning, pages 22964–22984. PMLR.
  44. Universal dependencies 2.7. LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University.
  45. A survey of large language models. arXiv preprint arXiv:2303.18223.
  46. The united nations parallel corpus v1. 0. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 3530–3534.
  47. Arabic diacritic restoration approach based on maximum entropy models. Computer Speech & Language, 23:257–276.
  48. Šlapeta, J. (2023). Are chatgpt and other pretrained language models good parasitologists? Trends in Parasitology, 39(5):314–316.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zaid Alyafeai (21 papers)
  2. Maged S. Alshaibani (2 papers)
  3. Badr AlKhamissi (24 papers)
  4. Hamzah Luqman (12 papers)
  5. Ebrahim Alareqi (3 papers)
  6. Ali Fadel (5 papers)
Citations (12)
Github Logo Streamline Icon: https://streamlinehq.com