Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models (2402.13524v1)

Published 21 Feb 2024 in cs.CL

Abstract: Modern LLMs should generally benefit individuals from various cultural backgrounds around the world. However, most recent advanced generative evaluation benchmarks tailed for LLMs mainly focus on English. To this end, we introduce OMGEval, the first Open-source Multilingual Generative test set that can assess the capability of LLMs in different languages. For each language, OMGEval provides 804 open-ended questions, covering a wide range of important capabilities of LLMs, such as general knowledge, logical reasoning, and so on. Each question is rigorously verified by human annotators. Notably, to sufficiently reflect the compatibility of LLMs in different cultural backgrounds, we perform localization for each non-English language. Specifically, the current version of OMGEval includes 5 languages (i.e., Zh, Ru, Fr, Es, Ar). Following AlpacaEval, we employ GPT-4 as the adjudicator to automatically score different model outputs, which is shown closely related to human evaluation. We evaluate several representative multilingual LLMs on the proposed OMGEval, which we believe will provide a valuable reference for the community to further understand and improve the multilingual capability of LLMs. OMGEval is available at https://github.com/blcuicall/OMGEval.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Yang Liu (2253 papers)
  2. Meng Xu (52 papers)
  3. Shuo Wang (382 papers)
  4. Liner Yang (22 papers)
  5. Haoyu Wang (309 papers)
  6. Zhenghao Liu (77 papers)
  7. Cunliang Kong (16 papers)
  8. Yun Chen (134 papers)
  9. Maosong Sun (337 papers)
  10. Erhong Yang (16 papers)