Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising (2203.01305v3)

Published 2 Mar 2022 in cs.CV and cs.AI

Abstract: We present in this paper a novel denoising training method to speedup DETR (DEtection TRansformer) training and offer a deepened understanding of the slow convergence issue of DETR-like methods. We show that the slow convergence results from the instability of bipartite graph matching which causes inconsistent optimization goals in early training stages. To address this issue, except for the Hungarian loss, our method additionally feeds ground-truth bounding boxes with noises into Transformer decoder and trains the model to reconstruct the original boxes, which effectively reduces the bipartite graph matching difficulty and leads to a faster convergence. Our method is universal and can be easily plugged into any DETR-like methods by adding dozens of lines of code to achieve a remarkable improvement. As a result, our DN-DETR results in a remarkable improvement ($+1.9$AP) under the same setting and achieves the best result (AP $43.4$ and $48.6$ with $12$ and $50$ epochs of training respectively) among DETR-like methods with ResNet-$50$ backbone. Compared with the baseline under the same setting, DN-DETR achieves comparable performance with $50\%$ training epochs. Code is available at \url{https://github.com/FengLi-ust/DN-DETR}.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Feng Li (286 papers)
  2. Hao Zhang (948 papers)
  3. Shilong Liu (60 papers)
  4. Jian Guo (76 papers)
  5. Lionel M. Ni (20 papers)
  6. Lei Zhang (1689 papers)
Citations (526)

Summary

Accelerating DETR Training with Query DeNoising

The paper "DN-DETR: Accelerate DETR Training by Introducing Query DeNoising" addresses the problem of slow convergence in DETR-based object detection models, which are a prominent aspect of computer vision research. Although DETR has demonstrated substantial progress, it tends to converge slower compared to traditional CNN-based models. This paper introduces a novel approach to enhance the training speed of DETR models through a technique referred to as query denoising.

Key Contributions

The paper provides significant extensions from prior work presented at CVPR 2022, demonstrating the versatility and efficacy of the denoising method. Several key contributions can be highlighted:

  1. Improved Performance and Convergence: The paper reports superior results with increased convergence speed. By implementing query denoising, DETR training becomes more efficient, thereby reducing computational overhead and time.
  2. Broader Application of Denoising Training: The denoising method is generalized beyond its initial scope. Specifically:
    • It has been applied to various DETR-based detection models with differing query formulations.
    • It has been extended to include DETR-based segmentation models, suggesting its potential for broader applications in computer vision tasks.
    • It has also been adapted for traditional CNN-based detection models, highlighting its adaptability and robustness across different architectures.
  3. Comprehensive Experimental Analysis: The manuscript includes an expanded set of experimental results and analyses. This comprehensive approach allows for a better understanding of the denoising method's impact on performance metrics and convergence rates.

Implications

The development of an effective denoising training approach for DETR models holds significant implications for both practical and theoretical aspects of computer vision:

  • Practical Impact: By achieving faster training times and improved performance, practitioners can deploy more efficient models in real-world applications, leading to timely and cost-effective solutions.
  • Theoretical Advancements: The exploration into query denoising provides deeper insights into the convergence behaviors of DETR-like models. This understanding could inform future model architectures and training protocols, ultimately pushing the boundaries of object detection capabilities.

Future Directions

The incorporation of query denoising into DETR models opens up several avenues for future research. There may be potential to further optimize denoising strategies or to explore their effects in other domains of AI. Additional investigations could examine the scalability and adaptability of these methods to more complex datasets or novel model architectures.

Overall, this paper enhances the landscape of object detection by tackling one of the critical limitations of DETR models, providing a tangible pathway toward more efficient training regimes.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com