Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation (2410.17250v1)

Published 22 Oct 2024 in cs.CL, cs.AI, and cs.CV
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation

Abstract: Accelerating research on Large Multimodal Models (LMMs) in non-English languages is crucial for enhancing user experiences across broader populations. In this paper, we introduce JMMMU (Japanese MMMU), the first large-scale Japanese benchmark designed to evaluate LMMs on expert-level tasks based on the Japanese cultural context. To facilitate comprehensive culture-aware evaluation, JMMMU features two complementary subsets: (i) culture-agnostic (CA) subset, where the culture-independent subjects (e.g., Math) are selected and translated into Japanese, enabling one-to-one comparison with its English counterpart MMMU; and (ii) culture-specific (CS) subset, comprising newly crafted subjects that reflect Japanese cultural context. Using the CA subset, we observe performance drop in many LMMs when evaluated in Japanese, which is purely attributable to language variation. Using the CS subset, we reveal their inadequate Japanese cultural understanding. Further, by combining both subsets, we identify that some LMMs perform well on the CA subset but not on the CS subset, exposing a shallow understanding of the Japanese language that lacks depth in cultural understanding. We hope this work will not only help advance LMM performance in Japanese but also serve as a guideline to create high-standard, culturally diverse benchmarks for multilingual LMM development. The project page is https://mmmu-japanese-benchmark.github.io/JMMMU/.

JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation

This essay provides an expert overview of the paper titled "JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation". The paper introduces JMMMU, a pioneering benchmark designed to evaluate Large Multimodal Models (LMMs) within the context of Japanese culture.

Objective and Methodology

The primary objective of JMMMU is to accelerate research on LMMs in non-English languages, ultimately enhancing user experiences across diverse populations. The benchmark features two distinct subsets to facilitate a comprehensive culture-aware evaluation:

  1. Culture-Agnostic (CA) Subset: This subset includes subjects independent of cultural context (e.g., Math), allowing direct comparison with English benchmarks through translation into Japanese. This approach isolates performance variation attributable purely to language differences.
  2. Culture-Specific (CS) Subset: Tailored to reflect the Japanese cultural context, this subset comprises newly crafted subjects such as Japanese Art and History. It aims to assess cultural understanding.

Key Findings

Upon evaluating 15 open-source LMMs and three advanced proprietary LMMs, several key findings emerge:

  • Performance Potential: The overall performance reaches up to 58.6%, indicating significant room for improvement in processing Japanese contexts.
  • Language Variation Impact: The CA subset reveals up to an 8.6% performance drop when LMMs are evaluated in Japanese compared to English, highlighting the challenges posed by language differences.
  • Cultural Understanding: Models trained on Japanese datasets outperform others on the CS subset, underscoring the effectiveness of fine-tuning for cultural knowledge integration.
  • Depth of Understanding: Analysis of both subsets exposes discrepancies among state-of-the-art proprietary models. Some models perform well on CA but not on CS subsets, revealing a superficial understanding of Japanese cultural nuances.

Implications and Future Directions

These findings suggest that existing English-centric performance evaluations may lead to development biases, neglecting non-English performance. The paper emphasizes the importance of creating culturally rich benchmarks, fostering a more inclusive approach to multilingual LMM development.

Looking forward, the establishment of benchmarks similar to JMMMU across other languages and cultures is encouraged. This initiative could lead to more robust and culturally adaptable AI models.

In conclusion, JMMMU serves as both a valuable tool for LMM developers and a guideline for crafting diverse, high-standard benchmarks necessary for the advancement of multicultural AI systems. The research significantly contributes to understanding LMM capabilities in non-English settings and sets a precedent for future work in the area of culture-aware AI evaluation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Shota Onohara (3 papers)
  2. Atsuyuki Miyai (10 papers)
  3. Yuki Imajuku (6 papers)
  4. Kazuki Egashira (5 papers)
  5. Jeonghun Baek (11 papers)
  6. Xiang Yue (72 papers)
  7. Graham Neubig (342 papers)
  8. Kiyoharu Aizawa (67 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets