Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts (1904.09073v3)

Published 19 Apr 2019 in cs.CV

Abstract: Computing author intent from multimodal data like Instagram posts requires modeling a complex relationship between text and image. For example, a caption might evoke an ironic contrast with the image, so neither caption nor image is a mere transcript of the other. Instead they combine -- via what has been called meaning multiplication -- to create a new meaning that has a more complex relation to the literal meanings of text and image. Here we introduce a multimodal dataset of 1299 Instagram posts labeled for three orthogonal taxonomies: the authorial intent behind the image-caption pair, the contextual relationship between the literal meanings of the image and caption, and the semiotic relationship between the signified meanings of the image and caption. We build a baseline deep multimodal classifier to validate the taxonomy, showing that employing both text and image improves intent detection by 9.6% compared to using only the image modality, demonstrating the commonality of non-intersective meaning multiplication. The gain with multimodality is greatest when the image and caption diverge semiotically. Our dataset offers a new resource for the study of the rich meanings that result from pairing text and image.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Julia Kruk (4 papers)
  2. Jonah Lubin (2 papers)
  3. Karan Sikka (32 papers)
  4. Xiao Lin (181 papers)
  5. Dan Jurafsky (118 papers)
  6. Ajay Divakaran (43 papers)
Citations (85)

Summary

We haven't generated a summary for this paper yet.