Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models (2404.18359v1)

Published 29 Apr 2024 in cs.CL and cs.AI

Abstract: In the burgeoning field of LLMs, the assessment of fundamental knowledge remains a critical challenge, particularly for models tailored to Chinese language and culture. This paper introduces FoundaBench, a pioneering benchmark designed to rigorously evaluate the fundamental knowledge capabilities of Chinese LLMs. FoundaBench encompasses a diverse array of 3354 multiple-choice questions across common sense and K-12 educational subjects, meticulously curated to reflect the breadth and depth of everyday and academic knowledge. We present an extensive evaluation of 12 state-of-the-art LLMs using FoundaBench, employing both traditional assessment methods and our CircularEval protocol to mitigate potential biases in model responses. Our results highlight the superior performance of models pre-trained on Chinese corpora, and reveal a significant disparity between models' reasoning and memory recall capabilities. The insights gleaned from FoundaBench evaluations set a new standard for understanding the fundamental knowledge of LLMs, providing a robust framework for future advancements in the field.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Wei Li (1121 papers)
  2. Ren Ma (5 papers)
  3. Jiang Wu (58 papers)
  4. Chenya Gu (3 papers)
  5. Jiahui Peng (7 papers)
  6. Jinyang Len (1 paper)
  7. Songyang Zhang (116 papers)
  8. Hang Yan (86 papers)
  9. Dahua Lin (336 papers)
  10. Conghui He (114 papers)