Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring Affordance and Situated Meaning in Image Captions: A Multimodal Analysis (2305.14616v2)

Published 24 May 2023 in cs.CL and cs.CV

Abstract: This paper explores the grounding issue regarding multimodal semantic representation from a computational cognitive-linguistic view. We annotate images from the Flickr30k dataset with five perceptual properties: Affordance, Perceptual Salience, Object Number, Gaze Cueing, and Ecological Niche Association (ENA), and examine their association with textual elements in the image captions. Our findings reveal that images with Gibsonian affordance show a higher frequency of captions containing 'holding-verbs' and 'container-nouns' compared to images displaying telic affordance. Perceptual Salience, Object Number, and ENA are also associated with the choice of linguistic expressions. Our study demonstrates that comprehensive understanding of objects or events requires cognitive attention, semantic nuances in language, and integration across multiple modalities. We highlight the vital importance of situated meaning and affordance grounding in natural language understanding, with the potential to advance human-like interpretation in various scenarios.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Pin-Er Chen (3 papers)
  2. Po-Ya Angela Wang (3 papers)
  3. Hsin-Yu Chou (3 papers)
  4. Yu-Hsiang Tseng (5 papers)
  5. Shu-Kai Hsieh (10 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.