Papers
Topics
Authors
Recent
Search
2000 character limit reached

Türkçe Dil Modellerinin Performans Karşılaştırması Performance Comparison of Turkish Language Models

Published 25 Apr 2024 in cs.CL and cs.AI | (2404.17010v1)

Abstract: The developments that LLMs have provided in fulfilling almost all kinds of tasks have attracted the attention of not only researchers but also the society and have enabled them to become products. There are commercially successful LLMs available. However, users may prefer open-source LLMs due to cost, data privacy, or regulations. Yet, despite the increasing number of these models, there is no comprehensive comparison of their performance for Turkish. This study aims to fill this gap in the literature. A comparison is made among seven selected LLMs based on their contextual learning and question-answering abilities. Turkish datasets for contextual learning and question-answering were prepared, and both automatic and human evaluations were conducted. The results show that for question-answering, continuing pretraining before fine-tuning with instructional datasets is more successful in adapting multilingual models to Turkish and that in-context learning performances do not much related to question-answering performances.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. “Open llm leaderboard - a hugging face space by huggingfaceh4,” 2024.
  2. “Lmsys chatbot arena leaderboard - a hugging face space by lmsys,” 2024.
  3. A. Srivastava, A. Rastogi, A. Rao, A. A. M. Shoeb, A. Abid, A. Fisch, A. R. Brown, A. Santoro, A. Gupta, A. Garriga-Alonso, et al., “Beyond the imitation game: Quantifying and extrapolating the capabilities of language models,” arXiv preprint arXiv:2206.04615, 2022.
  4. A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “Glue: A multi-task benchmark and analysis platform for natural language understanding,” arXiv preprint arXiv:1804.07461, 2018.
  5. W. Zhong, R. Cui, Y. Guo, Y. Liang, S. Lu, Y. Wang, A. Saied, W. Chen, and N. Duan, “Agieval: A human-centric benchmark for evaluating foundation models,” arXiv preprint arXiv:2304.06364, 2023.
  6. “Github - stefan-it/turkish-bert: Turkish bert/distilbert, electra and convbert models,” 2024.
  7. H. T. Kesgin, M. K. Yuce, and M. F. Amasyali, “Developing and evaluating tiny to medium-sized turkish bert models,” arXiv preprint arXiv:2307.14134, 2023.
  8. G. Uludoğan, Z. Y. Balal, F. Akkurt, M. Türker, O. Güngör, and S. Üsküdarlı, “Turna: A turkish encoder-decoder language model for enhanced understanding and generation,” arXiv preprint arXiv:2401.14373, 2024.
  9. “malhajar/mistral-7b-instruct-v0.2-turkish · hugging face,” 2024.
  10. “mohammedbriman/llama-2-7b-chat-turkish-instructions · hugging face,” 2024.
  11. “Trendyol/trendyol-llm-7b-base-v0.1 · hugging face,” 2024.
  12. “Trendyol/trendyol-llm-7b-chat-v0.1 · hugging face,” 2024.
  13. O. Shliazhko, A. Fenogenova, M. Tikhonova, V. Mikhailov, A. Kozlova, and T. Shavrina, “mgpt: Few-shot learners go multilingual,” arXiv preprint arXiv:2204.07580, 2022.
  14. “deepseek-ai/deepseek-llm-7b-chat · hugging face,” 2024.
  15. G. Wang, S. Cheng, X. Zhan, X. Li, S. Song, and Y. Liu, “Openchat: Advancing open-source language models with mixed-quality data,” arXiv preprint arXiv:2309.11235, 2023.
  16. P. Clark, I. Cowhey, O. Etzioni, T. Khot, A. Sabharwal, C. Schoenick, and O. Tafjord, “Think you have solved question answering? try arc, the ai2 reasoning challenge,” arXiv preprint arXiv:1803.05457, 2018.
  17. R. Zellers, A. Holtzman, Y. Bisk, A. Farhadi, and Y. Choi, “Hellaswag: Can a machine really finish your sentence?,” arXiv preprint arXiv:1905.07830, 2019.
  18. S. Lin, J. Hilton, and O. Evans, “Truthfulqa: Measuring how models mimic human falsehoods,” arXiv preprint arXiv:2109.07958, 2021.
  19. D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt, “Measuring massive multitask language understanding,” arXiv preprint arXiv:2009.03300, 2020.
  20. “merve/turkish_instructions · datasets at hugging face,” 2024.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.