Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models (2504.05782v1)

Published 8 Apr 2025 in cs.CV and cs.AI

Abstract: Multimodal reasoning, which integrates language and visual cues into problem solving and decision making, is a fundamental aspect of human intelligence and a crucial step toward artificial general intelligence. However, the evaluation of multimodal reasoning capabilities in Multimodal LLMs (MLLMs) remains inadequate. Most existing reasoning benchmarks are constrained by limited data size, narrow domain coverage, and unstructured knowledge distribution. To close these gaps, we introduce MDK12-Bench, a multi-disciplinary benchmark assessing the reasoning capabilities of MLLMs via real-world K-12 examinations. Spanning six disciplines (math, physics, chemistry, biology, geography, and information science), our benchmark comprises 140K reasoning instances across diverse difficulty levels from primary school to 12th grade. It features 6,827 instance-level knowledge point annotations based on a well-organized knowledge structure, detailed answer explanations, difficulty labels and cross-year partitions, providing a robust platform for comprehensive evaluation. Additionally, we present a novel dynamic evaluation framework to mitigate data contamination issues by bootstrapping question forms, question types, and image styles during evaluation. Extensive experiment on MDK12-Bench reveals the significant limitation of current MLLMs in multimodal reasoning. The findings on our benchmark provide insights into the development of the next-generation models. Our data and codes are available at https://github.com/LanceZPF/MDK12.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (20)
  1. Pengfei Zhou (40 papers)
  2. Fanrui Zhang (10 papers)
  3. Xiaopeng Peng (6 papers)
  4. Zhaopan Xu (15 papers)
  5. Jiaxin Ai (11 papers)
  6. Yansheng Qiu (5 papers)
  7. Chuanhao Li (32 papers)
  8. Zhen Li (334 papers)
  9. Ming Li (787 papers)
  10. Yukang Feng (8 papers)
  11. Jianwen Sun (18 papers)
  12. Haoquan Zhang (3 papers)
  13. Zizhen Li (6 papers)
  14. Xiaofeng Mao (35 papers)
  15. Wangbo Zhao (25 papers)
  16. Kai Wang (624 papers)
  17. Xiaojun Chang (148 papers)
  18. Wenqi Shao (89 papers)
  19. Yang You (173 papers)
  20. Kaipeng Zhang (73 papers)
Github Logo Streamline Icon: https://streamlinehq.com