Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs (2010.07526v1)

Published 15 Oct 2020 in cs.CL and cs.CV

Abstract: Natural language rationales could provide intuitive, higher-level explanations that are easily understandable by humans, complementing the more broadly studied lower-level explanations based on gradients or attention weights. We present the first study focused on generating natural language rationales across several complex visual reasoning tasks: visual commonsense reasoning, visual-textual entailment, and visual question answering. The key challenge of accurate rationalization is comprehensive image understanding at all levels: not just their explicit content at the pixel level, but their contextual contents at the semantic and pragmatic levels. We present RationaleVT Transformer, an integrated model that learns to generate free-text rationales by combining pretrained LLMs with object recognition, grounded visual semantic frames, and visual commonsense graphs. Our experiments show that the base pretrained LLM benefits from visual adaptation and that free-text rationalization is a promising research direction to complement model interpretability for complex visual-textual reasoning tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Ana Marasović (27 papers)
  2. Chandra Bhagavatula (46 papers)
  3. Jae Sung Park (35 papers)
  4. Ronan Le Bras (56 papers)
  5. Noah A. Smith (224 papers)
  6. Yejin Choi (287 papers)
Citations (56)