Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Composing Text and Image for Image Retrieval - An Empirical Odyssey (1812.07119v1)

Published 18 Dec 2018 in cs.CV

Abstract: In this paper, we study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image. For example, we may present an image of the Eiffel tower, and ask the system to find images which are visually similar but are modified in small ways, such as being taken at nighttime instead of during the day. To tackle this task, we learn a similarity metric between a target image and a source image plus source text, an embedding and composing function such that target image feature is close to the source image plus text composition feature. We propose a new way to combine image and text using such function that is designed for the retrieval task. We show this outperforms existing approaches on 3 different datasets, namely Fashion-200k, MIT-States and a new synthetic dataset we create based on CLEVR. We also show that our approach can be used to classify input queries, in addition to image retrieval.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Nam Vo (6 papers)
  2. Lu Jiang (90 papers)
  3. Chen Sun (187 papers)
  4. Kevin Murphy (87 papers)
  5. Li-Jia Li (29 papers)
  6. Li Fei-Fei (199 papers)
  7. James Hays (57 papers)
Citations (321)

Summary

  • The paper introduces a novel framework that seamlessly integrates textual and visual cues to achieve improved retrieval performance.
  • It leverages deep learning techniques to jointly learn representations, outperforming conventional image retrieval methods on benchmark datasets.
  • Experiments validate the method’s scalability and robustness, opening avenues for advanced multimodal search applications.

Overview of the CVPR Proceedings \LaTeX\ Author Guidelines

The document titled "LaTeX Author Guidelines for CVPR Proceedings" serves as a comprehensive style guide for authors submitting papers to the Conference on Computer Vision and Pattern Recognition (CVPR). The paper provides detailed instructions on various aspects of manuscript preparation, ensuring that submissions comply with the formatting and stylistic requirements set forth by the IEEE Computer Society Press.

Key Components of the Guidelines

The main body of the document systematically addresses numerous elements crucial to preparing a manuscript for CVPR. These include language, length, format, and blind review processes. Each section provides specific directives to help authors align their submissions with the expectations of the conference review and publication mechanisms.

  1. Language and Dual Submission: The paper emphasizes the requirement for all submissions to be in English and refers authors to specific CVPR guidelines regarding dual submissions, reflecting the conference's stance on originality and concurrent submissions to other venues.
  2. Paper Length: Stringent adherence to the page limit of eight pages, excluding the references, is mandated. The additional allowance for references underscores the importance of thorough citation without impacting the main content's space allocation.
  3. Formatting Requirements: The guidelines delineate precise dimensions for the document layout, type styles, and fonts. This section includes meticulous specifications for title placement, author names, abstract formatting, and main text alignment. These requirements ensure consistency and readability across submissions.
  4. Blind Review Process: Detailed instructions are provided to maintain the integrity of the double-blind review. Authors are cautioned against self-referential language that could compromise anonymity, reinforcing the importance of unbiased review and evaluation.
  5. Mathematics and Figures: Recommendations for numbering equations and placing figures are provided, ensuring clarity and ease of reference. The guidelines accommodate \LaTeX's peculiarities with suggestions for seamless integration of mathematical expressions and graphical content.
  6. Illustrations and Color Use: The document advises on the use of color and visual materials, aligning with the prevalent practice of electronic as well as printed review copies. Such guidance is vital for maintaining the accessibility and utility of graphical data across different mediums.

Implications and Future Considerations

The explicit details in these guidelines reflect the meticulous standards expected by CVPR, reinforcing the conference's commitment to maintaining a high quality of academic discourse. Authors are equipped with the knowledge to produce submissions that not only meet but potentially exceed the baseline standards for scholarly presentation.

Looking forward, the evolution of digital publishing may necessitate adaptations in these guidelines, especially concerning dynamic content integration and enhanced digital interfaces. However, the foundational principles of clarity, consistency, and academic rigor will undoubtedly remain central to CVPR's author guidelines.

In conclusion, this paper is an essential reference for prospective authors contributing to CVPR, ensuring a uniform standard that enhances the dissemination and impact of research within the computer vision community. The detailed specifications and standards outlined in this document are pivotal not only for individual contributions but also for the conference's openness to methodological and presentational innovations in future proceedings.