Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark (2405.20574v2)

Published 31 May 2024 in cs.CL and cs.AI

Abstract: This paper introduces the Open Ko-LLM Leaderboard and the Ko-H5 Benchmark as vital tools for evaluating LLMs in Korean. Incorporating private test sets while mirroring the English Open LLM Leaderboard, we establish a robust evaluation framework that has been well integrated in the Korean LLM community. We perform data leakage analysis that shows the benefit of private test sets along with a correlation study within the Ko-H5 benchmark and temporal analyses of the Ko-H5 score. Moreover, we present empirical support for the need to expand beyond set benchmarks. We hope the Open Ko-LLM Leaderboard sets precedent for expanding LLM evaluation to foster more linguistic diversity.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Chanjun Park (49 papers)
  2. Hyeonwoo Kim (13 papers)
  3. Dahyun Kim (21 papers)
  4. Seonghwan Cho (2 papers)
  5. Sanghoon Kim (19 papers)
  6. Sukyung Lee (8 papers)
  7. Yungi Kim (13 papers)
  8. Hwalsuk Lee (10 papers)
Citations (8)