Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
72 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluation (2402.15745v2)

Published 24 Feb 2024 in cs.CL, cs.AI, and cs.CV

Abstract: The Large Vision-LLMs (LVLMs) have demonstrated great abilities in image perception and language understanding. However, existing multimodal benchmarks focus on primary perception abilities and commonsense knowledge which are insufficient to reflect the comprehensive capabilities of LVLMs. We propose GAOKAO-MM, a multimodal benchmark based on the Chinese College Entrance Examination (GAOKAO), comprising of 8 subjects and 12 types of images, such as diagrams, function graphs, maps and photos. GAOKAO-MM derives from native Chinese context and sets human-level requirements for the model's abilities, including perception, understanding, knowledge and reasoning. We evaluate 10 LVLMs and find that the accuracies of all of them are lower than 50%, with GPT-4-Vison (48.1%), Qwen-VL-Plus (41.2%) and Gemini-Pro-Vision (35.1%) ranking in the top three positions. The results of our multi-dimension analysis indicate that LVLMs have moderate distance towards AGI and provide insights facilitating the development of multilingual LVLMs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Yi Zong (4 papers)
  2. Xipeng Qiu (257 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets