Evaluating GPT-4's Vision Capabilities on Brazilian University Admission Exams (2311.14169v1)

Published 23 Nov 2023 in cs.CL, cs.AI, and cs.LG

Abstract: Recent advancements in LLMs have showcased human-comparable performance in academic entrance exams. However, existing studies often overlook questions that require the integration of visual comprehension, thus compromising the full spectrum and complexity inherent in real-world scenarios. To address this gap, we present a comprehensive framework to evaluate LLMs on entrance exams, which incorporates both textual and visual elements. We evaluate the two most recent editions of Exame Nacional do Ensino M\'edio (ENEM), the main standardized entrance examination adopted by Brazilian universities. Our study not only reaffirms the capabilities of GPT-4 as the state of the art for handling complex multidisciplinary questions, but also pioneers in offering a realistic assessment of multimodal LLMs on Portuguese examinations. One of the highlights is that text captions transcribing visual content outperform the direct use of images, suggesting that the vision model has room for improvement. Yet, despite improvements afforded by images or captions, mathematical questions remain a challenge for these state-of-the-art models. The code and data used on experiments are available at https://github.com/piresramon/gpt-4-enem.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (25)

Authors (4)

Ramon Pires (11 papers)
Thales Sales Almeida (10 papers)
Hugo Abonizio (12 papers)
Rodrigo Nogueira (70 papers)

Citations (3)

View on Semantic Scholar

GitHub

GitHub - piresramon/gpt-4-enem: Code and data to evaluate LLMs on the ENEM, the main standardized Brazilian university admission exams. (47 stars)

Evaluating GPT-4's Vision Capabilities on Brazilian University Admission Exams (2311.14169v1)

Related Papers

GitHub