Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Vision-Language Models Do Not Understand Negation (2501.09425v1)

Published 16 Jan 2025 in cs.CV and cs.CL

Abstract: Many practical vision-language applications require models that understand negation, e.g., when using natural language to retrieve images which contain certain objects but not others. Despite advancements in vision-LLMs (VLMs) through large-scale training, their ability to comprehend negation remains underexplored. This study addresses the question: how well do current VLMs understand negation? We introduce NegBench, a new benchmark designed to evaluate negation understanding across 18 task variations and 79k examples spanning image, video, and medical datasets. The benchmark consists of two core tasks designed to evaluate negation understanding in diverse multimodal settings: Retrieval with Negation and Multiple Choice Questions with Negated Captions. Our evaluation reveals that modern VLMs struggle significantly with negation, often performing at chance level. To address these shortcomings, we explore a data-centric approach wherein we finetune CLIP models on large-scale synthetic datasets containing millions of negated captions. We show that this approach can result in a 10% increase in recall on negated queries and a 40% boost in accuracy on multiple-choice questions with negated captions.

Analysis of Style and Formatting Guidelines in CVPR Proceedings Templates

The provided document serves as a comprehensive guide outlining the appropriate preparation and formatting of manuscripts for submission to the CVPR proceedings. This essay seeks to evaluate the core elements and directives as prescribed by the IEEE Computer Society Press, emphasizing their significance in the context of proper academic communication standards.

Structural Norms and Manuscript Elements

The document meticulously delineates the expectations for language, dual submissions, and paper length, offering stringent measures to ensure uniformity among submissions. It explicitly states that all manuscripts must be submitted in English and discourages any variations in prescribed formatting such as deviations in margin settings or the extension of paper length beyond the established eight-page limit, exclusive of references. This is crucial in maintaining a level playing field among submissions and facilitating an efficient review process.

Technical Specifications and Consistency

A notable feature is the detailing of rulings with a printed ruler, facilitating reviewers to comment on specific lines without ambiguity. The document also reinforces meticulous equation numbering and sectional referencing, which are vital for the precision required in scholarly communication, thereby enhancing the manuscript's navigability for reviewers and readers alike.

Blind Review Process

Special attention is directed towards the process of blind review. The guidelines are clear in instructing authors not to omit self-citations but rather to frame them objectively without the use of pronouns that may compromise anonymity. This addresses a common misunderstanding about anonymization and preserves the integrity of the review process by preventing biased evaluations.

Emphasis on Mathematical Rigor

The guide further emphasizes the importance of mathematical rigor and thoroughness, exemplified in its guidance on mathematical expressions. It advocates for explicit demonstration through examples, verifying its importance for future referencing and comprehensive understanding of the content.

Visual and Graphical Integrity

Instructions on illustrations and graphs assert the necessity for clarity and professionalism in presenting visual data. The document specifies the use of \includegraphics for figure incorporations, ensuring that visuals are suitably scaled and integrated into the document to align with the text's font and format. This directive is pivotal for sustaining the paper's readability and the efficacy of visual aids employed in conveying research findings.

Compliance Through Enhanced Templates

The document concludes with sections on color use, cross-referencing, and final copy preparations. Authors are implored to respect visual accessibility, recognizing impairments like color vision deficiency and how they may influence the interpretation of color-based data. Moreover, the cross-referencing system is emphasized for internal consistency throughout the manuscript, ensuring all parts of a paper are easily navigable and contextually linked.

Conclusion and Forward-Looking Implications

The guidelines meticulously underscore the necessity for standardized manuscript preparation to promote fairness and readability in the review process. By maintaining stringent formatting rules and providing a detailed yet practical blueprint for paper submissions, the document fosters clarity and accessibility in scholarly communication, also paving the way for potential adaptations in evolving publication practices. As artificial intelligence continues to influence the nature of computational research, these guidelines serve as a historical anchor for consistency amidst technological advances in manuscript preparation and review methodologies. The evolution of future guidelines may seek to integrate evolving disciplines and methodologies while maintaining fidelity to core principles articulated within these established directives.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Kumail Alhamoud (8 papers)
  2. Shaden Alshammari (3 papers)
  3. Yonglong Tian (32 papers)
  4. Guohao Li (43 papers)
  5. Philip Torr (172 papers)
  6. Yoon Kim (92 papers)
  7. Marzyeh Ghassemi (96 papers)
Youtube Logo Streamline Icon: https://streamlinehq.com

HackerNews