Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 183 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 29 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 97 tok/s Pro
Kimi K2 221 tok/s Pro
GPT OSS 120B 440 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

DEFA: Efficient Deformable Attention Acceleration via Pruning-Assisted Grid-Sampling and Multi-Scale Parallel Processing (2403.10913v1)

Published 16 Mar 2024 in cs.AR

Abstract: Multi-scale deformable attention (MSDeformAttn) has emerged as a key mechanism in various vision tasks, demonstrating explicit superiority attributed to multi-scale grid-sampling. However, this newly introduced operator incurs irregular data access and enormous memory requirement, leading to severe PE underutilization. Meanwhile, existing approaches for attention acceleration cannot be directly applied to MSDeformAttn due to lack of support for this distinct procedure. Therefore, we propose a dedicated algorithm-architecture co-design dubbed DEFA, the first-of-its-kind method for MSDeformAttn acceleration. At the algorithm level, DEFA adopts frequency-weighted pruning and probability-aware pruning for feature maps and sampling points respectively, alleviating the memory footprint by over 80%. At the architecture level, it explores the multi-scale parallelism to boost the throughput significantly and further reduces the memory access via fine-grained layer fusion and feature map reusing. Extensively evaluated on representative benchmarks, DEFA achieves 10.1-31.9x speedup and 20.3-37.7x energy efficiency boost compared to powerful GPUs. It also rivals the related accelerators by 2.2-3.7x energy efficiency improvement while providing pioneering support for MSDeformAttn.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. Xizhou Zhu “Deformable DETR: Deformable Transformers for End-to-End Object Detection” In ICLR, 2021
  2. Jifeng Dai “Deformable Convolutional Networks” In ICCV, 2017, pp. 764–773
  3. Nicolas Carion “End-to-End Object Detection with Transformers” In ECCV, 2020, pp. 213–229
  4. Feng Li “DN-DETR: Accelerate DETR Training by Introducing Query Denoising” In CVPR, 2022, pp. 13619–13627
  5. Hao Zhang “DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection” In ICLR, 2022
  6. Zhiqi Li “BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers” In ECCV, 2022, pp. 1–18
  7. Yiming Li “VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion” In CVPR, 2023, pp. 9087–9098
  8. Yuanhui Huang “Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction” In CVPR, 2023, pp. 9223–9232
  9. Shaoqing Ren “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks” In NeurIPS 28, 2015
  10. Hanrui Wang “SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning” In HPCA, 2021, pp. 97–110
  11. Tae Jun Ham “ELSA: Hardware-software co-design for efficient, lightweight self-attention mechanism in neural networks” In ISCA, 2021, pp. 692–705
  12. Yang Wang “An Energy-Efficient Transformer Processor Exploiting Dynamic Weak Relevances in Global Attention” In IEEE JSSC 58.1, 2022, pp. 227–242
  13. Qijing Huang “CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs” In FPGA, 2021, pp. 206–216
  14. Shan Li “A Computational-Efficient Deformable Convolution Network Accelerator via Hardware and Algorithm Co-Optimization” In SiPS, 2022, pp. 1–6
  15. Tsung-Yi Lin “Microsoft COCO: Common Objects in Context” In ECCV, 2014, pp. 740–755
  16. Naveen Muralimanohar, Rajeev Balasubramonian and Norman P Jouppi “CACTI 6.0: A tool to model large caches” In HP laboratories 27, 2009, pp. 28
  17. Soroush Ghodrati “Bit-parallel vector composability for neural acceleration” In DAC, 2020, pp. 1–6

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.