Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch (1804.10819v1)

Published 28 Apr 2018 in cs.CV

Abstract: In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query. A cross-modal deep network architecture is formulated to jointly model the sketch and text input modalities as well as the the image output modality, learning a common embedding between text and images and between sketches and images. In addition, an attention model is used to selectively focus the attention on the different objects of the image, allowing for retrieval with multiple objects in the query. Experiments show that the proposed method performs the best in both single and multiple object image retrieval in standard datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Sounak Dey (11 papers)
  2. Anjan Dutta (41 papers)
  3. Suman K. Ghosh (7 papers)
  4. Ernest Valveny (28 papers)
  5. Umapada Pal (80 papers)
  6. Josep Lladós (40 papers)
Citations (23)

Summary

We haven't generated a summary for this paper yet.