Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multitask Text-to-Visual Embedding with Titles and Clickthrough Data (1905.13339v1)

Published 30 May 2019 in cs.CV and cs.IR

Abstract: Text-visual (or called semantic-visual) embedding is a central problem in vision-language research. It typically involves mapping of an image and a text description to a common feature space through a CNN image encoder and a RNN language encoder. In this paper, we propose a new method for learning text-visual embedding using both image titles and click-through data from an image search engine. We also propose a new triplet loss function by modeling positive awareness of the embedding, and introduce a novel mini-batch-based hard negative sampling approach for better data efficiency in the learning process. Experimental results show that our proposed method outperforms existing methods, and is also effective for real-world text-to-visual retrieval.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Pranav Aggarwal (7 papers)
  2. Zhe Lin (163 papers)
  3. Baldo Faieta (6 papers)
  4. Saeid Motiian (6 papers)
Citations (5)