Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

KoLA: Carefully Benchmarking World Knowledge of Large Language Models (2306.09296v3)

Published 15 Jun 2023 in cs.CL

Abstract: The unprecedented performance of LLMs necessitates improvements in evaluations. Rather than merely exploring the breadth of LLM abilities, we believe meticulous and thoughtful designs are essential to thorough, unbiased, and applicable evaluations. Given the importance of world knowledge to LLMs, we construct a Knowledge-oriented LLM Assessment benchmark (KoLA), in which we carefully design three crucial factors: (1) For \textbf{ability modeling}, we mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering $19$ tasks. (2) For \textbf{data}, to ensure fair comparisons, we use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, aiming to evaluate the capacity to handle unseen data and evolving knowledge. (3) For \textbf{evaluation criteria}, we adopt a contrastive system, including overall standard scores for better numerical comparability across tasks and models and a unique self-contrast metric for automatically evaluating knowledge-creating ability. We evaluate $28$ open-source and commercial LLMs and obtain some intriguing findings. The KoLA dataset and open-participation leaderboard are publicly released at https://kola.xlore.cn and will be continuously updated to provide references for developing LLMs and knowledge-related systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (35)
  1. Jifan Yu (49 papers)
  2. Xiaozhi Wang (51 papers)
  3. Shangqing Tu (18 papers)
  4. Shulin Cao (23 papers)
  5. Daniel Zhang-Li (10 papers)
  6. Xin Lv (38 papers)
  7. Hao Peng (291 papers)
  8. Zijun Yao (50 papers)
  9. Xiaohan Zhang (78 papers)
  10. Hanming Li (3 papers)
  11. Chunyang Li (19 papers)
  12. Zheyuan Zhang (61 papers)
  13. Yushi Bai (31 papers)
  14. Yantao Liu (13 papers)
  15. Amy Xin (7 papers)
  16. Nianyi Lin (6 papers)
  17. Kaifeng Yun (2 papers)
  18. Linlu Gong (4 papers)
  19. Jianhui Chen (23 papers)
  20. Zhili Wu (3 papers)
Citations (55)
X Twitter Logo Streamline Icon: https://streamlinehq.com