Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Conditional Image-Text Embedding Networks (1711.08389v4)

Published 22 Nov 2017 in cs.CV

Abstract: This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments. Our proposed solution simplifies the representation requirements for individual embeddings and allows the underrepresented concepts to take advantage of the shared representations before feeding them into concept-specific layers. Comprehensive experiments verify the effectiveness of our approach across three phrase grounding datasets, Flickr30K Entities, ReferIt Game, and Visual Genome, where we obtain a (resp.) 4%, 3%, and 4% improvement in grounding performance over a strong region-phrase embedding baseline.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Bryan A. Plummer (64 papers)
  2. Paige Kordas (1 paper)
  3. M. Hadi Kiapour (4 papers)
  4. Shuai Zheng (67 papers)
  5. Robinson Piramuthu (36 papers)
  6. Svetlana Lazebnik (40 papers)
Citations (115)