Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference (2207.01405v4)

Published 4 Jul 2022 in cs.CV
I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference

Abstract: Vision Transformers (ViTs) have achieved state-of-the-art performance on various computer vision applications. However, these models have considerable storage and computational overheads, making their deployment and efficient inference on edge devices challenging. Quantization is a promising approach to reducing model complexity, and the dyadic arithmetic pipeline can allow the quantized models to perform efficient integer-only inference. Unfortunately, dyadic arithmetic is based on the homogeneity condition in convolutional neural networks, which is not applicable to the non-linear components in ViTs, making integer-only inference of ViTs an open issue. In this paper, we propose I-ViT, an integer-only quantization scheme for ViTs, to enable ViTs to perform the entire computational graph of inference with integer arithmetic and bit-shifting, and without any floating-point arithmetic. In I-ViT, linear operations (e.g., MatMul and Dense) follow the integer-only pipeline with dyadic arithmetic, and non-linear operations (e.g., Softmax, GELU, and LayerNorm) are approximated by the proposed light-weight integer-only arithmetic methods. More specifically, I-ViT applies the proposed Shiftmax and ShiftGELU, which are designed to use integer bit-shifting to approximate the corresponding floating-point operations. We evaluate I-ViT on various benchmark models and the results show that integer-only INT8 quantization achieves comparable (or even slightly higher) accuracy to the full-precision (FP) baseline. Furthermore, we utilize TVM for practical hardware deployment on the GPU's integer arithmetic units, achieving 3.72$\sim$4.11$\times$ inference speedup compared to the FP model. Code of both Pytorch and TVM is released at https://github.com/zkkli/I-ViT.

Overview of \LaTeX\ Author Guidelines for ICCV Proceedings

The paper presents a comprehensive set of instructions and guidelines for authors preparing their submissions for the International Conference on Computer Vision (ICCV) proceedings. It delineates various facets of manuscript preparation using \LaTeX, structured to ensure uniformity and adherence to the standards expected by the IEEE Computer Society Press, which publishes the proceedings.

Key Elements of the Guidelines

  1. Manuscript Language and Structure:
    • The document emphasizes that manuscripts must be written in English, maintained in a cohesive structure pertinent to scientific documents.
  2. Submission Policies and Paper Length:
    • The guidelines particularly underscore the policy on dual submissions and adherence to a strict page limit, exclusive of references, emphasizing that overlength papers will not be reviewed.
  3. Formatting Specifications:
    • Detailed instructions on formatting include two-column text layout, specified margins, font styles, and sizes, which must be meticulously followed to conform to ICCV requirements. For instance, the main title utilizes 14-point boldface Times, while the main text is set in 10-point Times in a fully justified format.
  4. Review Process Considerations:
    • The guidelines explain the blind review process, advising authors on how to correctly anonymize their manuscripts without omitting necessary citations, thereby facilitating an unbiased review process.
  5. Graphical and Mathematical Content:
    • Authors are instructed on the integration of figures, equations, and tables within the text. The paper describes the formatting of mathematical equations and how to include and caption visual content to maintain consistency throughout the manuscript.
  6. Miscellaneous Formatting Details:
    • The document covers several other detailed formatting options, including the treatment of footnotes, references, and color usage. Emphasis is placed on maintaining legibility and consistency in the printed versions of the papers.

Implications and Future Considerations

The guidelines provide a foundation for consistency in academic publishing within the ICCV proceedings, crucial for maintaining the conference's reputation and the accessibility of information within the computer vision research community. By imposing stringent formatting and submission policies, the ICCV ensures that its proceedings reflect a high standard of scholarly presentation and documentation. Adhering to these guidelines greatly assists in preserving the archival quality of the technical literature.

Future revisions of these guidelines could potentially integrate advancements in digital publishing formats, facilitating richer media content within manuscripts, or adapting to broader standards in scholarly publishing. As the field of computer vision advances and possibly further converges with other interdisciplinary domains, the guidelines may evolve to encompass a more diverse range of content types while still adhering to scientific rigor and clarity.

Overall, this document serves as a vital resource for authors aiming to contribute to one of the premier conferences in the field of computer vision, ensuring that their work is presented in the most professional manner and readily accessible to the conference audience and beyond.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Zhikai Li (24 papers)
  2. Qingyi Gu (25 papers)
Citations (73)
Github Logo Streamline Icon: https://streamlinehq.com