Türkçe Dil Modellerinin Performans Karşılaştırması Performance Comparison of Turkish Language Models (2404.17010v1)
Abstract: The developments that LLMs have provided in fulfilling almost all kinds of tasks have attracted the attention of not only researchers but also the society and have enabled them to become products. There are commercially successful LLMs available. However, users may prefer open-source LLMs due to cost, data privacy, or regulations. Yet, despite the increasing number of these models, there is no comprehensive comparison of their performance for Turkish. This study aims to fill this gap in the literature. A comparison is made among seven selected LLMs based on their contextual learning and question-answering abilities. Turkish datasets for contextual learning and question-answering were prepared, and both automatic and human evaluations were conducted. The results show that for question-answering, continuing pretraining before fine-tuning with instructional datasets is more successful in adapting multilingual models to Turkish and that in-context learning performances do not much related to question-answering performances.
- “Open llm leaderboard - a hugging face space by huggingfaceh4,” 2024.
- “Lmsys chatbot arena leaderboard - a hugging face space by lmsys,” 2024.
- A. Srivastava, A. Rastogi, A. Rao, A. A. M. Shoeb, A. Abid, A. Fisch, A. R. Brown, A. Santoro, A. Gupta, A. Garriga-Alonso, et al., “Beyond the imitation game: Quantifying and extrapolating the capabilities of language models,” arXiv preprint arXiv:2206.04615, 2022.
- A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “Glue: A multi-task benchmark and analysis platform for natural language understanding,” arXiv preprint arXiv:1804.07461, 2018.
- W. Zhong, R. Cui, Y. Guo, Y. Liang, S. Lu, Y. Wang, A. Saied, W. Chen, and N. Duan, “Agieval: A human-centric benchmark for evaluating foundation models,” arXiv preprint arXiv:2304.06364, 2023.
- “Github - stefan-it/turkish-bert: Turkish bert/distilbert, electra and convbert models,” 2024.
- H. T. Kesgin, M. K. Yuce, and M. F. Amasyali, “Developing and evaluating tiny to medium-sized turkish bert models,” arXiv preprint arXiv:2307.14134, 2023.
- G. Uludoğan, Z. Y. Balal, F. Akkurt, M. Türker, O. Güngör, and S. Üsküdarlı, “Turna: A turkish encoder-decoder language model for enhanced understanding and generation,” arXiv preprint arXiv:2401.14373, 2024.
- “malhajar/mistral-7b-instruct-v0.2-turkish · hugging face,” 2024.
- “mohammedbriman/llama-2-7b-chat-turkish-instructions · hugging face,” 2024.
- “Trendyol/trendyol-llm-7b-base-v0.1 · hugging face,” 2024.
- “Trendyol/trendyol-llm-7b-chat-v0.1 · hugging face,” 2024.
- O. Shliazhko, A. Fenogenova, M. Tikhonova, V. Mikhailov, A. Kozlova, and T. Shavrina, “mgpt: Few-shot learners go multilingual,” arXiv preprint arXiv:2204.07580, 2022.
- “deepseek-ai/deepseek-llm-7b-chat · hugging face,” 2024.
- G. Wang, S. Cheng, X. Zhan, X. Li, S. Song, and Y. Liu, “Openchat: Advancing open-source language models with mixed-quality data,” arXiv preprint arXiv:2309.11235, 2023.
- P. Clark, I. Cowhey, O. Etzioni, T. Khot, A. Sabharwal, C. Schoenick, and O. Tafjord, “Think you have solved question answering? try arc, the ai2 reasoning challenge,” arXiv preprint arXiv:1803.05457, 2018.
- R. Zellers, A. Holtzman, Y. Bisk, A. Farhadi, and Y. Choi, “Hellaswag: Can a machine really finish your sentence?,” arXiv preprint arXiv:1905.07830, 2019.
- S. Lin, J. Hilton, and O. Evans, “Truthfulqa: Measuring how models mimic human falsehoods,” arXiv preprint arXiv:2109.07958, 2021.
- D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt, “Measuring massive multitask language understanding,” arXiv preprint arXiv:2009.03300, 2020.
- “merve/turkish_instructions · datasets at hugging face,” 2024.
- Eren Dogan (10 papers)
- M. Egemen Uzun (4 papers)
- Atahan Uz (4 papers)
- H. Emre Seyrek (2 papers)
- Ahmed Zeer (4 papers)
- Ezgi Sevi (1 paper)
- H. Toprak Kesgin (6 papers)
- M. Kaan Yuce (4 papers)
- M. Fatih Amasyali (7 papers)