RTMDet: An Empirical Study of Designing Real-Time Object Detectors (2212.07784v2)

Published 14 Dec 2022 in cs.CV

Abstract: In this paper, we aim to design an efficient real-time object detector that exceeds the YOLO series and is easily extensible for many object recognition tasks such as instance segmentation and rotated object detection. To obtain a more efficient model architecture, we explore an architecture that has compatible capacities in the backbone and neck, constructed by a basic building block that consists of large-kernel depth-wise convolutions. We further introduce soft labels when calculating matching costs in the dynamic label assignment to improve accuracy. Together with better training techniques, the resulting object detector, named RTMDet, achieves 52.8% AP on COCO with 300+ FPS on an NVIDIA 3090 GPU, outperforming the current mainstream industrial detectors. RTMDet achieves the best parameter-accuracy trade-off with tiny/small/medium/large/extra-large model sizes for various application scenarios, and obtains new state-of-the-art performance on real-time instance segmentation and rotated object detection. We hope the experimental results can provide new insights into designing versatile real-time object detectors for many object recognition tasks. Code and models are released at https://github.com/open-mmlab/mmdetection/tree/3.x/configs/rtmdet.

Citations (269)

View on Semantic Scholar

Summary

The paper presents a novel architecture using large-kernel depth-wise convolutions that boosts the effective receptive field with minimal computational overhead.
It introduces dynamic soft label assignment and balanced backbone-neck design to optimize the parameter-accuracy trade-off.
The study achieves 52.8% AP on COCO at 300+ FPS, demonstrating strong performance in real-time detection and segmentation tasks.

Analysis of "RTMDet: An Empirical Study of Designing Real-Time Object Detectors"

The paper "RTMDet: An Empirical Study of Designing Real-Time Object Detectors" presents a comprehensive exploration of designing efficient real-time object detectors. The work introduces RTMDet, a versatile family of object detectors capable of handling tasks beyond traditional object detection, such as instance segmentation and rotated object detection.

Key Contributions

The authors have made several notable innovations in the architecture and training strategies of object detectors:

Model Architecture: The paper introduces large-kernel depth-wise convolutions within the basic building blocks of the backbone and neck, enhancing the effective receptive field without a significant computational overhead. This architectural adjustment is crucial for improving the model's capacity in capturing global context.
Optimization of Backbone and Neck: The paper explores the balance between backbone and neck capacities, proposing that similar capacities across these components improve the parameter-accuracy trade-off.
Soft Label Assignment: The authors enhance dynamic label assignment strategies by incorporating soft labels, improving the discrimination of the cost matrix for high-quality matching. This modification yields significant performance gains.
Training Strategies: The work introduces a two-stage training strategy, employing cached Mosaic and MixUp augmentations in the initial stage and Large Scale Jittering in the final stage to refine model accuracy. The adoption of AdamW optimizer further stabilizes training.

Numerical Results and Performance

RTMDet demonstrates strong empirical performance, achieving 52.8% AP on COCO dataset at 300+ FPS with an NVIDIA 3090 GPU. This result surpasses other mainstream industrial detectors, highlighting RTMDet's efficacy in achieving a superior speed-accuracy balance across different model sizes, from tiny to extra-large. The results emphasize RTMDet's capability to outperform previous methods in real-time instance segmentation and rotated object detection on benchmarks like COCO and DOTA v1.0, reaching 44.6% mask AP and 81.33% AP, respectively.

Implications and Future Directions

The advancements in RTMDet have far-reaching implications for the deployment of real-time object detectors in various applications, including autonomous driving, robotics, and surveillance systems. The modular architecture allows easy extension to additional tasks, potentially broadening its applicability across different domains requiring fast and accurate object recognition.

Future work may explore the integration of RTMDet with emerging hardware accelerations and further tuning for edge deployments. Additionally, the soft label assignment strategy could inspire new directions in learning dynamics, optimizing it across diverse datasets and tasks.

In summary, RTMDet presents a significant stride in the field of real-time object detection, offering a well-rounded and experimentally validated approach that delivers on both performance and adaptability across multiple object recognition tasks. This research will likely serve as a foundation for future innovations in the efficient design and deployment of AI-driven perception systems.

Related Papers

GitHub

YouTube

Show All Videos