Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning to Select: A Fully Attentive Approach for Novel Object Captioning (2106.01424v1)

Published 2 Jun 2021 in cs.CV and cs.CL

Abstract: Image captioning models have lately shown impressive results when applied to standard datasets. Switching to real-life scenarios, however, constitutes a challenge due to the larger variety of visual concepts which are not covered in existing training sets. For this reason, novel object captioning (NOC) has recently emerged as a paradigm to test captioning models on objects which are unseen during the training phase. In this paper, we present a novel approach for NOC that learns to select the most relevant objects of an image, regardless of their adherence to the training set, and to constrain the generative process of a LLM accordingly. Our architecture is fully-attentive and end-to-end trainable, also when incorporating constraints. We perform experiments on the held-out COCO dataset, where we demonstrate improvements over the state of the art, both in terms of adaptability to novel objects and caption quality.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Marco Cagrandi (1 paper)
  2. Marcella Cornia (61 papers)
  3. Matteo Stefanini (7 papers)
  4. Lorenzo Baraldi (68 papers)
  5. Rita Cucchiara (142 papers)
Citations (8)