Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers (2103.15679v1)

Published 29 Mar 2021 in cs.CV and cs.LG

Abstract: Transformers are increasingly dominating multi-modal reasoning tasks, such as visual question answering, achieving state-of-the-art results thanks to their ability to contextualize information using the self-attention and co-attention mechanisms. These attention modules also play a role in other computer vision tasks including object detection and image segmentation. Unlike Transformers that only use self-attention, Transformers with co-attention require to consider multiple attention maps in parallel in order to highlight the information that is relevant to the prediction in the model's input. In this work, we propose the first method to explain prediction by any Transformer-based architecture, including bi-modal Transformers and Transformers with co-attentions. We provide generic solutions and apply these to the three most commonly used of these architectures: (i) pure self-attention, (ii) self-attention combined with co-attention, and (iii) encoder-decoder attention. We show that our method is superior to all existing methods which are adapted from single modality explainability.

PDF Abstract

Overview of "Author Guidelines for ICCV Proceedings"

The document under consideration offers a comprehensive guide for authors submitting manuscripts to the International Conference on Computer Vision (ICCV), focusing on the format and style required for final submissions. This guidance is designed to ensure uniformity and quality in presentation across the diverse papers presented at the conference.

Key Aspects

The paper emphasizes several critical elements related to manuscript preparation. These include language requirements, submission policies, document formatting, and anonymization processes necessary for blind review. It is particularly meticulous in enumerating the standards for manuscript length, use of color, and the structure of the document, including sections like the abstract, introduction, and references. It mandates the use of English and restricts the manuscript length, excluding references, to eight pages with no additional charges for extra pages. Authors are forewarned that papers exceeding this limit will not be reviewed.

One unique feature detailed is the requirement for a "ruler" in the submission version, allowing reviewers to reference specific lines efficiently. The document provides detailed instructions on formatting parameters, such as page layout with two-column text, margin settings, font types, and sizes. Specific typeface instructions are given, with a strong preference for Times Roman or equivalent fonts.

Anonymization and Dual Submission

The guide elucidates the dual submission policy and the process of maintaining an anonymous manuscript suitable for blind review. Authors are advised to craft references to their prior work in a manner that maintains anonymity, while also recognizing the importance of allowing reviewers access to relevant previous research. Blind review is clarified to allow citations of the authors' own work as long as personal pronouns are avoided.

Practical Implications

By providing stringent guidelines, the document ensures a standardized format that aids both authors and reviewers. This standardization not only streamlines the review process but also makes the archival of conference papers uniform, thereby contributing to the accessibility and readability of computational vision research.

Future Directions and Considerations

As conferences continue to evolve with advancements in digital publishing and reviewing software, there is potential for these guidelines to further adapt. For instance, future developments might incorporate more interactive elements or improve collaboration between authors and reviewers through advanced online systems. Additionally, as digital tools for document preparation grow more sophisticated, the use of alternative manuscript preparation systems beyond LaTeX could be considered, provided they meet baseline format and style requirements.

In conclusion, "Author Guidelines for ICCV Proceedings" is a document that serves as a critical resource for authors aiming to contribute to ICCV. It provides a detailed framework to prepare manuscripts that meet the high standards expected at this influential venue for computer vision research.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Hila Chefer (14 papers)
Shir Gur (13 papers)
Lior Wolf (217 papers)

Citations (266)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos