Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Türkçe Dil Modellerinin Performans Karşılaştırması Performance Comparison of Turkish Language Models (2404.17010v1)

Published 25 Apr 2024 in cs.CL and cs.AI

Abstract: The developments that LLMs have provided in fulfilling almost all kinds of tasks have attracted the attention of not only researchers but also the society and have enabled them to become products. There are commercially successful LLMs available. However, users may prefer open-source LLMs due to cost, data privacy, or regulations. Yet, despite the increasing number of these models, there is no comprehensive comparison of their performance for Turkish. This study aims to fill this gap in the literature. A comparison is made among seven selected LLMs based on their contextual learning and question-answering abilities. Turkish datasets for contextual learning and question-answering were prepared, and both automatic and human evaluations were conducted. The results show that for question-answering, continuing pretraining before fine-tuning with instructional datasets is more successful in adapting multilingual models to Turkish and that in-context learning performances do not much related to question-answering performances.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. “Open llm leaderboard - a hugging face space by huggingfaceh4,” 2024.
  2. “Lmsys chatbot arena leaderboard - a hugging face space by lmsys,” 2024.
  3. A. Srivastava, A. Rastogi, A. Rao, A. A. M. Shoeb, A. Abid, A. Fisch, A. R. Brown, A. Santoro, A. Gupta, A. Garriga-Alonso, et al., “Beyond the imitation game: Quantifying and extrapolating the capabilities of language models,” arXiv preprint arXiv:2206.04615, 2022.
  4. A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “Glue: A multi-task benchmark and analysis platform for natural language understanding,” arXiv preprint arXiv:1804.07461, 2018.
  5. W. Zhong, R. Cui, Y. Guo, Y. Liang, S. Lu, Y. Wang, A. Saied, W. Chen, and N. Duan, “Agieval: A human-centric benchmark for evaluating foundation models,” arXiv preprint arXiv:2304.06364, 2023.
  6. “Github - stefan-it/turkish-bert: Turkish bert/distilbert, electra and convbert models,” 2024.
  7. H. T. Kesgin, M. K. Yuce, and M. F. Amasyali, “Developing and evaluating tiny to medium-sized turkish bert models,” arXiv preprint arXiv:2307.14134, 2023.
  8. G. Uludoğan, Z. Y. Balal, F. Akkurt, M. Türker, O. Güngör, and S. Üsküdarlı, “Turna: A turkish encoder-decoder language model for enhanced understanding and generation,” arXiv preprint arXiv:2401.14373, 2024.
  9. “malhajar/mistral-7b-instruct-v0.2-turkish · hugging face,” 2024.
  10. “mohammedbriman/llama-2-7b-chat-turkish-instructions · hugging face,” 2024.
  11. “Trendyol/trendyol-llm-7b-base-v0.1 · hugging face,” 2024.
  12. “Trendyol/trendyol-llm-7b-chat-v0.1 · hugging face,” 2024.
  13. O. Shliazhko, A. Fenogenova, M. Tikhonova, V. Mikhailov, A. Kozlova, and T. Shavrina, “mgpt: Few-shot learners go multilingual,” arXiv preprint arXiv:2204.07580, 2022.
  14. “deepseek-ai/deepseek-llm-7b-chat · hugging face,” 2024.
  15. G. Wang, S. Cheng, X. Zhan, X. Li, S. Song, and Y. Liu, “Openchat: Advancing open-source language models with mixed-quality data,” arXiv preprint arXiv:2309.11235, 2023.
  16. P. Clark, I. Cowhey, O. Etzioni, T. Khot, A. Sabharwal, C. Schoenick, and O. Tafjord, “Think you have solved question answering? try arc, the ai2 reasoning challenge,” arXiv preprint arXiv:1803.05457, 2018.
  17. R. Zellers, A. Holtzman, Y. Bisk, A. Farhadi, and Y. Choi, “Hellaswag: Can a machine really finish your sentence?,” arXiv preprint arXiv:1905.07830, 2019.
  18. S. Lin, J. Hilton, and O. Evans, “Truthfulqa: Measuring how models mimic human falsehoods,” arXiv preprint arXiv:2109.07958, 2021.
  19. D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt, “Measuring massive multitask language understanding,” arXiv preprint arXiv:2009.03300, 2020.
  20. “merve/turkish_instructions · datasets at hugging face,” 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Eren Dogan (10 papers)
  2. M. Egemen Uzun (4 papers)
  3. Atahan Uz (4 papers)
  4. H. Emre Seyrek (2 papers)
  5. Ahmed Zeer (4 papers)
  6. Ezgi Sevi (1 paper)
  7. H. Toprak Kesgin (6 papers)
  8. M. Kaan Yuce (4 papers)
  9. M. Fatih Amasyali (7 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets