Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Benchmarking Pre-trained Large Language Models' Potential Across Urdu NLP tasks (2405.15453v1)

Published 24 May 2024 in cs.CL and cs.AI

Abstract: LLMs pre-trained on multilingual data have revolutionized natural language processing research, by transitioning from languages and task specific model pipelines to a single model adapted on a variety of tasks. However majority of existing multilingual NLP benchmarks for LLMs provide evaluation data in only few languages with little linguistic diversity. In addition these benchmarks lack quality assessment against the respective state-of the art models. This study presents an in-depth examination of prominent LLMs; GPT-3.5-turbo, Llama2-7B-Chat, Bloomz 7B1 and Bloomz 3B, across 14 tasks using 15 Urdu datasets, in a zero-shot setting, and their performance against state-of-the-art (SOTA) models, has been compared and analysed. Our experiments show that SOTA models surpass all the encoder-decoder pre-trained LLMs in all Urdu NLP tasks with zero-shot learning. Our results further show that LLMs with fewer parameters, but more language specific data in the base model perform better than larger computational models, but low language data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Openai api pricing.
  2. Larabench: Benchmarking arabic ai with large language models.
  3. Addressing cyberbullying in urdu tweets: a comprehensive dataset and detection system. PeerJ Comput. Sci., 10:e1963.
  4. Mega: Multilingual evaluation of generative ai.
  5. Ise-hate: A benchmark corpus for inter-faith, sectarian, and ethnic hatred detection on social media in urdu. Information Processing & Management, 60.
  6. Improving hate speech detection of urdu tweets using sentiment analysis. IEEE Access, PP:1–1.
  7. Multi-label emotion classification of urdu tweets. PeerJ Computer Science, 8:e896.
  8. Context-aware emotion detection from low-resource urdu language using deep neural network. ACM Trans. Asian Low-Resour. Lang. Inf. Process., 22(5).
  9. Chatgpt is a knowledgeable but inexperienced solver: An investigation of commonsense problem in large language models.
  10. BigScienceWorkshop and et al. 2023. Bloom: A 176b-parameter open-access multilingual language model.
  11. On the opportunities and risks of foundation models. CoRR, abs/2108.07258.
  12. Abusive and threatening language detection in urdu using boosting based and BERT based models: A comparative approach. CoRR, abs/2111.14830.
  13. Paul Ekman. 1999. Basic Emotions, chapter 3. John Wiley & Sons, Ltd.
  14. How good are gpt models at machine translation? a comprehensive evaluation.
  15. Muhammad Humayoun and Naheed Akhtar. 2022. Corpures: Benchmark corpus for urdu extractive summaries and experiments using supervised learning. Intelligent Systems with Applications, 16:200129.
  16. Bushra Jawaid and Daniel Zeman. 2011. Word-order issues in english-to-urdu statistical machine translation. The Prague Bulletin of Mathematical Linguistics, 95.
  17. Urdu named entity recognition: Corpus generation and deep learning applications.
  18. Prosoul: A framework to identify propaganda from online urdu content.
  19. Fake news classification using machine learning: Count vectorizer and support vector machine. Journal of Computing & Biomedical Informatics, 4.
  20. The bigscience roots corpus: A 1.6tb composite multilingual dataset.
  21. Holistic evaluation of language models.
  22. Llmrec: Benchmarking large language models on recommendation task.
  23. Crosslingual generalization through multitask finetuning.
  24. Khalid Bin Muhammad and S. M. Aqil Burney. 2023. Innovations in urdu sentiment analysis using machine and deep learning techniques for two-class classification of symmetric datasets. Symmetry, 15(5).
  25. OpenAI and et al. 2023. Gpt-4 technical report.
  26. Sana Shams and Muhammad Aslam. 2022. Improving user intent detection in urdu web queries with capsule net architectures. Applied Sciences, 12:11861.
  27. Counter: corpus of urdu news text reuse. Language Resources and Evaluation, 51.
  28. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models.
  29. Llama 2: Open foundation and fine-tuned chat models.
  30. Corpus of aspect-based sentiment for urdu political data.
  31. Attention is all you need. CoRR, abs/1706.03762.
  32. QMSum: A new benchmark for query-based multi-domain meeting summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5905–5921, Online. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Munief Hassan Tahir (2 papers)
  2. Sana Shams (2 papers)
  3. Layba Fiaz (2 papers)
  4. Farah Adeeba (1 paper)
  5. Sarmad Hussain (2 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets