Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ArtEmis: Affective Language for Visual Art (2101.07396v1)

Published 19 Jan 2021 in cs.CV and cs.CL

Abstract: We present a novel large-scale dataset and accompanying machine learning models aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language. In contrast to most existing annotation datasets in computer vision, we focus on the affective experience triggered by visual artworks and ask the annotators to indicate the dominant emotion they feel for a given image and, crucially, to also provide a grounded verbal explanation for their emotion choice. As we demonstrate below, this leads to a rich set of signals for both the objective content and the affective impact of an image, creating associations with abstract concepts (e.g., "freedom" or "love"), or references that go beyond what is directly visible, including visual similes and metaphors, or subjective references to personal experiences. We focus on visual art (e.g., paintings, artistic photographs) as it is a prime example of imagery created to elicit emotional responses from its viewers. Our dataset, termed ArtEmis, contains 439K emotion attributions and explanations from humans, on 81K artworks from WikiArt. Building on this data, we train and demonstrate a series of captioning systems capable of expressing and explaining emotions from visual stimuli. Remarkably, the captions produced by these systems often succeed in reflecting the semantic and abstract content of the image, going well beyond systems trained on existing datasets. The collected dataset and developed methods are available at https://artemisdataset.org.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Panos Achlioptas (16 papers)
  2. Maks Ovsjanikov (71 papers)
  3. Kilichbek Haydarov (6 papers)
  4. Mohamed Elhoseiny (102 papers)
  5. Leonidas Guibas (177 papers)
Citations (106)

Summary

  • The paper presents ArtEmis, a dataset that captures 439,000 emotional annotations and explanations for 81,000 artworks to explore affective responses.
  • The methodology uses human annotators to describe their dominant emotions with rich, metaphorical language, offering deeper subjective insights.
  • Neural models trained on ArtEmis surpass baseline captioning approaches in generating affectively grounded explanations, though they still lag behind human nuance.

ArtEmis: Affective Language for Visual Art

The paper introduces ArtEmis, a large-scale dataset designed to explore the intricate relationship between visual art, emotional reactions, and verbal emotion explanations. Unlike traditional datasets, which focus on objective content, ArtEmis explores the affective experiences art provokes, offering a rich textual representation of emotions stimulated by artworks. This unique dataset encompasses 439,000 emotional annotations and explanations for 81,000 artworks sourced from WikiArt, endeavoring to bridge the gap between the objective qualities of art and its subjective impact on viewers.

The methodology of ArtEmis involves requesting human annotators to describe their dominant emotional reaction to each artwork and provide a detailed textual explanation. This process results in a diverse and linguistically rich corpus, capturing abstract semantics and an affective lexicon significantly surpassing conventional datasets like COCO captions in emotional richness and variety. The paper reports notable linguistic properties, including the prevalence of metaphorical and imaginative language tied to personal experiences and emotional nuances.

Analyzing the ArtEmis dataset, the authors demonstrate a substantial presence of subjective responses. Despite this, there is considerable agreement among annotators regarding the dominant emotions elicited by artworks. The dataset's complexity, arising from the dual demand for emotional maturity and perceptual specificity, highlights the need for sophisticated models that can handle abstract and sentiment-rich descriptions.

The authors present an array of neural models trained on ArtEmis, including emotion-grounded speakers that surpass baseline captioning models through their ability to generate well-grounded affective explanations. These models perform reasonably well in emotional Turing tests, indicating their potential to replicate human-like emotional responses. However, the paper points out that these generated captions are not yet on par with human annotations, underlining the challenges of capturing nuanced emotional interpretations through machine learning techniques.

In terms of implications, ArtEmis paves the way for enhancements in computational affective understanding, specifically in how machines might emulate human emotional responses to visual stimuli. Approaching human-computer interaction from an affective standpoint offers intriguing prospects for future developments in AI, especially in applications necessitating empathetic or sentiment-driven communication.

Overall, ArtEmis contributes substantial insights into the interplay between emotions and visual stimuli, laying foundational work for advancing emotional intelligence in AI systems. Future work could extend this task by improving neural speaker diversity, accuracy, and adopting LLMs that integrate semantic meaning with affective components more seamlessly.

Youtube Logo Streamline Icon: https://streamlinehq.com