EfficientDet: Scalable and Efficient Object Detection (1911.09070v7)

Published 20 Nov 2019 in cs.CV, cs.LG, and eess.IV

Abstract: Model efficiency has become increasingly important in computer vision. In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency. First, we propose a weighted bi-directional feature pyramid network (BiFPN), which allows easy and fast multiscale feature fusion; Second, we propose a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time. Based on these optimizations and better backbones, we have developed a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior art across a wide spectrum of resource constraints. In particular, with single model and single-scale, our EfficientDet-D7 achieves state-of-the-art 55.1 AP on COCO test-dev with 77M parameters and 410B FLOPs, being 4x - 9x smaller and using 13x - 42x fewer FLOPs than previous detectors. Code is available at https://github.com/google/automl/tree/master/efficientdet.

PDF Abstract

Overview of EfficientDet: Scalable and Efficient Object Detection

The paper "EfficientDet: Scalable and Efficient Object Detection" by Mingxing Tan, Ruoming Pang, and Quoc V. Le presents a comprehensive paper on neural network architectures for object detection. This work introduces several pivotal optimizations aimed at improving the efficiency without sacrificing accuracy, leading to the development of a new family of object detectors named EfficientDet.

Key Contributions

Bi-Directional Feature Pyramid Network (BiFPN):
- The authors propose a weighted BiFPN, offering an effective method for multi-scale feature fusion. Unlike conventional FPNs that utilize a one-way information flow, BiFPN introduces bi-directional cross-scale connections to ensure that feature information flows both top-down and bottom-up, facilitating better feature aggregation.
Compound Scaling Method:
- A novel compound scaling approach is introduced to uniformly scale the resolution, depth, and width of the backbone, feature network, and prediction networks. This holistic method contrasts with traditional scaling techniques, which often only scale one or two dimensions.

Numerical Results and Comparisons

EfficientDet models demonstrate substantial improvements in both accuracy and efficiency compared to previous state-of-the-art detectors:

EfficientDet-D7 achieved a notable 55.1 AP on the COCO test-dev dataset with 77 million parameters and 410 billion FLOPs. It outperforms prior detectors by a significant margin, being 4x to 9x smaller and using 13x to 42x fewer FLOPs.
Efficiency is evident across different tiers of models:
- EfficientDet-D0 achieves 34.6 AP with just 3.9 million parameters and 2.5 billion FLOPs, compared to YOLOv3's 33.0 AP with 71 billion FLOPs.

The paper also provides a rigorous comparison with existing detectors:

Compared to YOLOv3, EfficientDet-D0 is 28x more efficient in FLOPs while achieving slightly higher accuracy.
EfficientDet models (D1 to D7) consistently outperform models like RetinaNet, Mask R-CNN, and NAS-FPN across several efficiency metrics, including parameters, FLOPs, and latency on both GPU and CPU.

Implications

The implications of EfficientDet's advancements are multifaceted:

Practical Applications: The efficiency gains render these models highly suitable for real-time applications such as robotics and autonomous vehicles, where computational resources and latency are critical constraints.
Theoretical Insights: The compound scaling method can serve as a blueprint for future research in scaling neural network architectures. The concept of weighted feature fusion within BiFPN also opens new avenues for exploring more refined feature aggregation techniques.

Future Developments

Considering the articulate scalability and efficiency demonstrated by EfficientDet, future developments might explore:

Extended Applications: Applying EfficientDet to broader domains like instance segmentation, panoptic segmentation, and beyond, exploiting its inherent efficiency.
Advanced Scaling Methods: Further refinement of compound scaling approaches to fine-tune the balance between computational cost and accuracy, possibly tailored for specific hardware accelerators like TPUs or edge devices.
Integration with Emerging Architectures: Experimenting with integration into newer backbone architectures and exploring more intricate feature pyramid designs could yield further improvements in performance.

Conclusion

EfficientDet establishes a new benchmark for object detection by achieving state-of-the-art accuracy while significantly reducing model complexity and computational demands. The combined innovations in multi-scale feature fusion and compound scaling compose a robust framework that can be extended and adapted for various computer vision tasks. This work has substantial implications for both practical deployment in resource-constrained environments and theoretical advancements in neural network design.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Mingxing Tan (46 papers)
Ruoming Pang (59 papers)
Quoc V. Le (128 papers)

Citations (4,395)

View on Semantic Scholar