What does a platypus look like? Generating customized prompts for zero-shot image classification (2209.03320v3)

Published 7 Sep 2022 in cs.CV and cs.LG

Abstract: Open-vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open-vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without relying on any explicit knowledge of the task domain and with far fewer hand-constructed sentences. To achieve this, we combine open-vocabulary models with LLMs to create Customized Prompts via LLMs (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that contain important discriminating characteristics of the image categories. This allows the model to place a greater importance on these regions in the image when making predictions. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this simple baseline requires no additional training and remains completely zero-shot. Code available at https://github.com/sarahpratt/CuPL.

References (66)

Authors (4)

Sarah Pratt (8 papers)
Ian Covert (18 papers)
Rosanne Liu (25 papers)
Ali Farhadi (138 papers)

Citations (168)

View on Semantic Scholar

Summary

The paper introduces a novel approach for automatically generating customized text prompts to improve zero-shot image classification.
It leverages language models to create tailored prompts that capture distinctive visual features for clearer category differentiation.
Experimental results demonstrate enhanced classification accuracy and robust generalization across diverse image classes.

Overview of ICCV \LaTeX\ Author Guidelines

The paper "LaTeX Author Guidelines for ICCV Proceedings" provides a detailed set of instructions for preparing manuscripts intended for submission to the International Conference on Computer Vision (ICCV). This document serves as a comprehensive manual for authors to ensure their submissions adhere to the formatting and submission requirements crucial for the ICCV review process and subsequent publication.

Key Aspects of the Guidelines

The guidelines encompass various aspects of manuscript preparation, emphasizing the necessity for consistent formatting and strict adherence to submission protocols. Here are the principal components outlined in the paper:

Abstract and Formatting: Authors are instructed to craft an abstract in a fully-justified, italicized style located at the beginning of the paper. The main text is to be formatted in a two-column layout with specified margins and fonts, predominantly utilizing Times or a similar typeface. This consistent formatting aids in the uniformity of the conference proceedings.
Paper Length and Submission: Submissions are restricted to eight pages, excluding references. This constraint ensures brevity and conciseness in presenting research. The paper stipulates that overlength submissions will be summarily rejected, emphasizing the importance of adhering to these limitations. Additionally, no extra page charges are applied for the conference, a critical consideration for authors planning their submissions.
Blind Review Process: The document delineates the requirements for the double-blind review process, clarifying common misconceptions. Authors are encouraged to anonymize their manuscripts by avoiding self-referential terms like "our" or "my" when citing their previous work. Explicit guidance on handling references to related submissions and technical reports is included to ensure compliance with the blind review policy.
Mathematical Notations and Figures: Instructions concerning the numbering of equations and the formatting of figures and tables are provided. Proper structuring of mathematical content and related graphics is key to maintaining clarity and precision in scientific communication.
Style and Layout: Authors must adhere to a specific style for headings, fonts, and other typographic elements, which include directives for constructing figures and captions, emphasizing readability and visual consistency. This standardization affects both the readability of the papers and the ease of review.
Supporting Materials: The inclusion of supplemental documents, such as technical reports, is addressed. These materials can be crucial for reviewers but should be structured so that the primary submission remains comprehensible on its own.

Implications and Future Considerations

The detailed nature of these guidelines reflects the ongoing efforts of the ICCV organizers to maintain a high standard of quality and uniformity across submissions. The thorough formatting instructions allow authors to focus on the content of their research, potentially leading to more rigorous and concise presentations of scientific work.

Future considerations could involve the integration of automated tools that assist authors in checking compliance with these guidelines before submission. Additionally, as digital formats evolve, there may be a need to revisit and update these guidelines to incorporate advances in manuscript preparation technologies.

These guidelines are not only a reflection of ICCV's commitment to high standards in scientific publication but also set a benchmark for other conferences in the field. As AI and computer vision continue to advance, ensuring effective communication of ideas through meticulously prepared documents will remain a cornerstone of academic success.

PDF Markdown

GitHub

GitHub - sarahpratt/CuPL (156 stars)