GPT-4o reads the mind in the eyes (2410.22309v2)

Published 29 Oct 2024 in cs.HC and cs.CY

Abstract: LLMs are capable of reproducing human-like inferences, including inferences about emotions and mental states, from text. Whether this capability extends beyond text to other modalities remains unclear. Humans possess a sophisticated ability to read the mind in the eyes of other people. Here we tested whether this ability is also present in GPT-4o, a multimodal LLM. Using two versions of a widely used theory of mind test, the Reading the Mind in Eyes Test and the Multiracial Reading the Mind in the Eyes Test, we found that GPT-4o outperformed humans in interpreting mental states from upright faces but underperformed humans when faces were inverted. While humans in our sample showed no difference between White and Non-white faces, GPT-4o's accuracy was higher for White than for Non-white faces. GPT-4o's errors were not random but revealed a highly consistent, yet incorrect, processing of mental-state information across trials, with an orientation-dependent error structure that qualitatively differed from that of humans for inverted faces but not for upright faces. These findings highlight how advanced mental state inference abilities and human-like face processing signatures, such as inversion effects, coexist in GPT-4o alongside substantial differences in information processing compared to humans.

Summary

The paper demonstrates that GPT-4o outperforms humans on upright RMET images but experiences a drastic drop in accuracy with inverted images.
It employs comparative analysis using information theory metrics to reveal structured differences in error patterns between GPT-4o and human responses.
The study highlights racial bias and processing limitations in GPT-4o, underscoring the need for improved training approaches in multimodal AI systems.

Evaluation of GPT-4o's Capability in Reading the Mind in the Eyes

The paper investigates the extent to which the capabilities of LLMs, specifically GPT-4o, extend beyond text processing to include the interpretation of mental states from visual stimuli. This assessment was conducted using two versions of the "Reading the Mind in the Eyes Test" (RMET), which traditionally measure human abilities in theory of mind using photographs of the eye region to infer mental states.

The paper involved a comparison between human subjects and GPT-4o on both the RMET and its multiracial version (MRMET). Notably, GPT-4o surpassed human performance in identifying mental states from upright images but demonstrated notably poorer performance with inverted images. This inversion effect, a well-documented phenomenon in human face processing, manifested more severely in GPT-4o, indicating potentially different processing mechanisms compared to humans. Furthermore, GPT-4o's performance showed a racial bias, with better accuracy for White faces compared to Non-white faces, unlike humans in the paper who displayed no such bias.

Results Overview

In the RMET, GPT-4o achieved superior accuracy in interpreting mental states from upright images compared to human subjects, yet its accuracy sharply declined for inverted images, which was more pronounced than the typical 15% decrement observed in humans. This suggests a significant disruption in information extraction in GPT-4o when dealing with image inversion, a limitation potentially attributed to its training predominantly on upright face images. Concerns also existed regarding the potential presence of the RMET's stimuli in GPT-4o's training data, which could inflate its performance.

The MRMET results supported these findings, confirming the model's inversion sensitivity and racial bias, as indicated by higher accuracies for White faces. Intriguingly, GPT-4o's accuracy fell below chance level for inverted stimuli, contrasting with the above-chance performance of human subjects.

Analysis of Error Patterns

The paper further probes the error patterns of human and GPT-4o responses utilizing information theory metrics. GPT-4o's errors, though consistent, offered substantially more information regarding mental state differentiation than human errors, which appeared more random. Similarity analysis indicates that although GPT-4o maintained a highly structured error space across trial runs, inversion led to qualitative alterations not observed in human error spaces. In humans, inversion induced quantitative changes without altering underlying error structures, underscoring a potential disparity in processing strategies between GPT-4o and humans.

Implications and Future Directions

The findings of this paper underline fundamental differences in how LLMs like GPT-4o process visual information from human cognitive patterns, particularly under image inversion conditions. These insights bear significant implications for advancing human-LLM interaction, especially regarding AI's interpretation of subtle psychological cues in real-time social environments. However, the observed error biases and vulnerabilities to inversion necessitate careful contemplation of LLM deployment in sensitive applications, such as mental health assessments or settings requiring nuanced social interaction.

The racial bias exhibited by GPT-4o in face recognition tasks also suggests areas requiring improvement in training datasets and methodologies to mitigate such disparities. Future research could investigate model training enhancements or algorithmic adjustments aligned with reducing racial biases and refining multimodal perception capabilities in AI systems.

This research contributes to the broader discourse on the transferability of human-like cognitive functions to artificial systems, providing valuable insights into the intersection of AI and advanced social cognition.

PDF Markdown

Related Papers

YouTube

Show All Videos

HackerNews

GPT-4o reads the mind in the eyes (2 points, 0 comments)