Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark (2408.07543v4)

Published 14 Aug 2024 in cs.CV and cs.CL

Abstract: With the development of Multimodal LLMs (MLLMs), the evaluation of multimodal models in the context of mathematical problems has become a valuable research field. Multimodal visual-textual mathematical reasoning serves as a critical indicator for evaluating the comprehension and complex multi-step quantitative reasoning abilities of MLLMs. However, previous multimodal math benchmarks have not sufficiently integrated visual and textual information. To address this gap, we proposed MathScape, a new benchmark that emphasizes the understanding and application of combined visual and textual information. MathScape is designed to evaluate photo-based math problem scenarios, assessing the theoretical understanding and application ability of MLLMs through a categorical hierarchical approach. We conduct a multi-dimensional evaluation on 11 advanced MLLMs, revealing that our benchmark is challenging even for the most sophisticated models. By analyzing the evaluation results, we identify the limitations of MLLMs, offering valuable insights for enhancing model performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (15)
  1. Minxuan Zhou (6 papers)
  2. Hao Liang (137 papers)
  3. Tianpeng Li (14 papers)
  4. Zhiyu Wu (26 papers)
  5. MingAn Lin (12 papers)
  6. Linzhuang Sun (18 papers)
  7. Yaqi Zhou (3 papers)
  8. Yan Zhang (954 papers)
  9. Xiaoqin Huang (3 papers)
  10. Yicong Chen (6 papers)
  11. Yujing Qiao (5 papers)
  12. Weipeng Chen (56 papers)
  13. Bin Cui (165 papers)
  14. Wentao Zhang (261 papers)
  15. Zenan Zhou (24 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com