Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

IC3: Image Captioning by Committee Consensus (2302.01328v3)

Published 2 Feb 2023 in cs.CV, cs.AI, cs.CL, and cs.LG

Abstract: If you ask a human to describe an image, they might do so in a thousand different ways. Traditionally, image captioning models are trained to generate a single "best" (most like a reference) image caption. Unfortunately, doing so encourages captions that are "informationally impoverished," and focus on only a subset of the possible details, while ignoring other potentially useful information in the scene. In this work, we introduce a simple, yet novel, method: "Image Captioning by Committee Consensus" (IC3), designed to generate a single caption that captures high-level details from several annotator viewpoints. Humans rate captions produced by IC3 at least as helpful as baseline SOTA models more than two thirds of the time, and IC3 can improve the performance of SOTA automated recall systems by up to 84%, outperforming single human-generated reference captions, and indicating significant improvements over SOTA approaches for visual description. Code is available at https://davidmchan.github.io/caption-by-committee/

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. David M. Chan (30 papers)
  2. Austin Myers (7 papers)
  3. Sudheendra Vijayanarasimhan (15 papers)
  4. David A. Ross (27 papers)
  5. John Canny (44 papers)
Citations (13)