Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CLEVA: Chinese Language Models EVAluation Platform (2308.04813v2)

Published 9 Aug 2023 in cs.CL

Abstract: With the continuous emergence of Chinese LLMs, how to evaluate a model's capabilities has become an increasingly significant issue. The absence of a comprehensive Chinese benchmark that thoroughly assesses a model's performance, the unstandardized and incomparable prompting procedure, and the prevalent risk of contamination pose major challenges in the current evaluation of Chinese LLMs. We present CLEVA, a user-friendly platform crafted to holistically evaluate Chinese LLMs. Our platform employs a standardized workflow to assess LLMs' performance across various dimensions, regularly updating a competitive leaderboard. To alleviate contamination, CLEVA curates a significant proportion of new data and develops a sampling strategy that guarantees a unique subset for each leaderboard round. Empowered by an easy-to-use interface that requires just a few mouse clicks and a model API, users can conduct a thorough evaluation with minimal coding. Large-scale experiments featuring 23 Chinese LLMs have validated CLEVA's efficacy.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Yanyang Li (22 papers)
  2. Jianqiao Zhao (6 papers)
  3. Duo Zheng (13 papers)
  4. Zi-Yuan Hu (6 papers)
  5. Zhi Chen (235 papers)
  6. Xiaohui Su (1 paper)
  7. Yongfeng Huang (110 papers)
  8. Shijia Huang (11 papers)
  9. Dahua Lin (336 papers)
  10. Michael R. Lyu (176 papers)
  11. Liwei Wang (239 papers)
Citations (8)
Github Logo Streamline Icon: https://streamlinehq.com