- The paper presents ArtEmis, a dataset that captures 439,000 emotional annotations and explanations for 81,000 artworks to explore affective responses.
- The methodology uses human annotators to describe their dominant emotions with rich, metaphorical language, offering deeper subjective insights.
- Neural models trained on ArtEmis surpass baseline captioning approaches in generating affectively grounded explanations, though they still lag behind human nuance.
ArtEmis: Affective Language for Visual Art
The paper introduces ArtEmis, a large-scale dataset designed to explore the intricate relationship between visual art, emotional reactions, and verbal emotion explanations. Unlike traditional datasets, which focus on objective content, ArtEmis explores the affective experiences art provokes, offering a rich textual representation of emotions stimulated by artworks. This unique dataset encompasses 439,000 emotional annotations and explanations for 81,000 artworks sourced from WikiArt, endeavoring to bridge the gap between the objective qualities of art and its subjective impact on viewers.
The methodology of ArtEmis involves requesting human annotators to describe their dominant emotional reaction to each artwork and provide a detailed textual explanation. This process results in a diverse and linguistically rich corpus, capturing abstract semantics and an affective lexicon significantly surpassing conventional datasets like COCO captions in emotional richness and variety. The paper reports notable linguistic properties, including the prevalence of metaphorical and imaginative language tied to personal experiences and emotional nuances.
Analyzing the ArtEmis dataset, the authors demonstrate a substantial presence of subjective responses. Despite this, there is considerable agreement among annotators regarding the dominant emotions elicited by artworks. The dataset's complexity, arising from the dual demand for emotional maturity and perceptual specificity, highlights the need for sophisticated models that can handle abstract and sentiment-rich descriptions.
The authors present an array of neural models trained on ArtEmis, including emotion-grounded speakers that surpass baseline captioning models through their ability to generate well-grounded affective explanations. These models perform reasonably well in emotional Turing tests, indicating their potential to replicate human-like emotional responses. However, the paper points out that these generated captions are not yet on par with human annotations, underlining the challenges of capturing nuanced emotional interpretations through machine learning techniques.
In terms of implications, ArtEmis paves the way for enhancements in computational affective understanding, specifically in how machines might emulate human emotional responses to visual stimuli. Approaching human-computer interaction from an affective standpoint offers intriguing prospects for future developments in AI, especially in applications necessitating empathetic or sentiment-driven communication.
Overall, ArtEmis contributes substantial insights into the interplay between emotions and visual stimuli, laying foundational work for advancing emotional intelligence in AI systems. Future work could extend this task by improving neural speaker diversity, accuracy, and adopting LLMs that integrate semantic meaning with affective components more seamlessly.