Vision Transformer with Deformable Attention (2201.00520v3)

Published 3 Jan 2022 in cs.CV

Abstract: Transformers have recently shown superior performances on various vision tasks. The large, sometimes even global, receptive field endows Transformer models with higher representation power over their CNN counterparts. Nevertheless, simply enlarging receptive field also gives rise to several concerns. On the one hand, using dense attention e.g., in ViT, leads to excessive memory and computational cost, and features can be influenced by irrelevant parts which are beyond the region of interests. On the other hand, the sparse attention adopted in PVT or Swin Transformer is data agnostic and may limit the ability to model long range relations. To mitigate these issues, we propose a novel deformable self-attention module, where the positions of key and value pairs in self-attention are selected in a data-dependent way. This flexible scheme enables the self-attention module to focus on relevant regions and capture more informative features. On this basis, we present Deformable Attention Transformer, a general backbone model with deformable attention for both image classification and dense prediction tasks. Extensive experiments show that our models achieve consistently improved results on comprehensive benchmarks. Code is available at https://github.com/LeapLabTHU/DAT.

PDF Abstract

Author Response Guidelines for Academic Conferences

The paper, "LaTeX Guidelines for Author Response," provides a comprehensive framework for authors preparing rebuttals following paper reviews in conferences, particularly focusing on computer vision and pattern recognition forums. This document delineates the procedural and formatting stipulations that authors should adhere to when crafting a response to reviewers.

Core Objectives

The primary objective of an author response, as clarified in the paper, is to allow authors to address specific factual errors or respond to additional information requests from reviewers. It is expressly not intended for introducing novel contributions or expanding the scope of the initial submission without a direct request from the reviewers.

Key Structural Guidelines

The response document must adhere to strict formatting constraints:

Length: The response is limited to one page, encompassing all figures, references, and any supplementary information.
Layout: Text should conform to a two-column layout with specific margins to maintain uniformity and readability.
Content: While authors may include additional illustrations or comparative tables, these should clearly reference existing data from the original submission or documented literature, avoiding the introduction of unreviewed experimental results.

Formatting and Presentation

The authors underscore the significance of maintaining anonymity and avoiding identifiable links within the rebuttal. The document must follow the template guidelines provided, ensuring figure captions, references, and other elements conform to designated font styles and sizes.

Equations and Figures: Proper numbering is mandatory to facilitate precise referencing within the document.
Graphics: All graphical elements should be centered, with font sizes appropriately scaled to match the main text for consistency in presentation.

Implications

The paper's guidelines ensure a balanced review process, protecting authors from potentially onerous requests for additional experiments while focusing on the clarification and correction of existing content. This approach allows for a concentrated dialogue between authors and reviewers, fostering a more effective scholarly communication process.

These stipulations contribute to maintaining the integrity and fairness of the review procedure, particularly in high-stakes academic environments where publication standards are stringent. Adhering to such standardized guidelines can potentially streamline rebuttal reviews, reducing uncertainties and enhancing overall communication efficiency within the academic community.

Future Considerations

As AI and machine learning conferences continue to evolve, future adaptations of these guidelines may incorporate advancements in communication technology and collaborative platforms. With increasing submission volumes, the standardization of such protocols might play an integral role in sustaining the efficacy and fairness of academic discourse across diverse research domains.

In summary, the structured approach outlined in this paper provides a critical framework for author responses, emphasizing clarity, conciseness, and adherence to established norms, thereby supporting a fair and rigorous peer-review process.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Zhuofan Xia (12 papers)
Xuran Pan (14 papers)
Shiji Song (103 papers)
Li Erran Li (37 papers)
Gao Huang (178 papers)

Citations (367)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - LeapLabTHU/DAT: Repository of Vision Transformer with Deformable Attention (CVPR2022) and DAT++: Spatially Dynamic Vision Transformerwith Deformable Attention (768 stars)

Tweets

https://twitter.com/naiveoculus/status/1478285477014913028