Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Grounded Textual Entailment (1806.05645v1)

Published 14 Jun 2018 in cs.CL and cs.CV

Abstract: Capturing semantic relations between sentences, such as entailment, is a long-standing challenge for computational semantics. Logic-based models analyse entailment in terms of possible worlds (interpretations, or situations) where a premise P entails a hypothesis H iff in all worlds where P is true, H is also true. Statistical models view this relationship probabilistically, addressing it in terms of whether a human would likely infer H from P. In this paper, we wish to bridge these two perspectives, by arguing for a visually-grounded version of the Textual Entailment task. Specifically, we ask whether models can perform better if, in addition to P and H, there is also an image (corresponding to the relevant "world" or "situation"). We use a multimodal version of the SNLI dataset (Bowman et al., 2015) and we compare "blind" and visually-augmented models of textual entailment. We show that visual information is beneficial, but we also conduct an in-depth error analysis that reveals that current multimodal models are not performing "grounding" in an optimal fashion.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Hoa Trong Vu (1 paper)
  2. Claudio Greco (5 papers)
  3. Aliia Erofeeva (3 papers)
  4. Somayeh Jafaritazehjan (1 paper)
  5. Guido Linders (1 paper)
  6. Marc Tanti (13 papers)
  7. Alberto Testoni (13 papers)
  8. Raffaella Bernardi (24 papers)
  9. Albert Gatt (48 papers)
Citations (29)

Summary

We haven't generated a summary for this paper yet.