YOLOX: Exceeding YOLO Series in 2021 (2107.08430v2)

Published 18 Jul 2021 in cs.CV

Abstract: In this report, we present some experienced improvements to YOLO series, forming a new high-performance detector -- YOLOX. We switch the YOLO detector to an anchor-free manner and conduct other advanced detection techniques, i.e., a decoupled head and the leading label assignment strategy SimOTA to achieve state-of-the-art results across a large scale range of models: For YOLO-Nano with only 0.91M parameters and 1.08G FLOPs, we get 25.3% AP on COCO, surpassing NanoDet by 1.8% AP; for YOLOv3, one of the most widely used detectors in industry, we boost it to 47.3% AP on COCO, outperforming the current best practice by 3.0% AP; for YOLOX-L with roughly the same amount of parameters as YOLOv4-CSP, YOLOv5-L, we achieve 50.0% AP on COCO at a speed of 68.9 FPS on Tesla V100, exceeding YOLOv5-L by 1.8% AP. Further, we won the 1st Place on Streaming Perception Challenge (Workshop on Autonomous Driving at CVPR 2021) using a single YOLOX-L model. We hope this report can provide useful experience for developers and researchers in practical scenes, and we also provide deploy versions with ONNX, TensorRT, NCNN, and Openvino supported. Source code is at https://github.com/Megvii-BaseDetection/YOLOX.

PDF Abstract

YOLOX: Exceeding YOLO Series in 2021

The paper, "YOLOX: Exceeding YOLO Series in 2021," introduces significant improvements to the YOLO (You Only Look Once) family of object detectors, resulting in a state-of-the-art model termed YOLOX. This model leverages modern detection techniques such as anchor-free mechanisms, decoupled heads, and the SimOTA label assignment strategy to enhance detection performance across multiple scales.

Technical Advancements

Anchor-Free Mechanism

A key innovation in YOLOX is the transition to an anchor-free architecture. Traditional YOLO models, including YOLOv4 and YOLOv5, rely on anchor-based mechanisms, which involve clustering analysis to determine sets of optimal anchors for training. YOLOX simplifies this by reducing predictions per location and directly predicting bounding box dimensions. This not only lowers computational complexity but also achieves better performance, attaining 42.9% AP as an anchor-free model.

Decoupled Head

The paper addresses the conflict between classification and regression tasks in traditional YOLO models by implementing a decoupled head. This divides the detection head into separate branches for classification and regression, enhancing the model's convergence speed and final performance. Experimental results indicate that the decoupled head significantly boosts performance, achieving 39.6% AP compared to the 38.5% AP of the baseline YOLOv3 head.

SimOTA Label Assignment

The SimOTA label assignment strategy is another crucial contribution. It simplifies the Optimal Transport Assignment (OTA) method by dynamically selecting top-k positive predictions for each ground truth object. This reduces the computational overhead while maintaining high detection accuracy, resulting in a 47.3% AP, surpassing YOLOv3's current best practice by 3.0% AP.

Performance Metrics

The YOLOX model demonstrates strong numerical results across various configurations:

YOLOX-L achieves 50.0% AP on COCO with $640 \times 640$ resolution at 68.9 FPS on a Tesla V100, outperforming YOLOv5-L by 1.8% AP.
YOLOX-Nano with 0.91M parameters and 1.08G FLOPs achieves 25.3% AP on COCO, surpassing NanoDet by 1.8% AP.
Clearly, YOLOX models consistently achieve higher AP than their corresponding YOLOv5 counterparts, across model sizes from YOLOX-S to YOLOX-X.

Implications and Future Developments

Practically, these advancements enable YOLOX models to achieve superior speed-accuracy trade-offs, particularly valuable in real-time applications such as autonomous driving. The paper's empirical results validate the efficacy of employing contemporary detection strategies in object recognition tasks, endorsing a broader adoption in diverse AI-powered systems.

From a theoretical perspective, the integration of the decoupled head and SimOTA strategies offers fresh insights into model architecture design and label assignment methodologies. The move towards anchor-free mechanisms simplifies the model training processes and potentially broadens the applicability of YOLOX in various computational environments.

Future Speculations

Looking forward, several areas could be explored to further enhance the capabilities of YOLOX:

Incorporation of Transformer-based models, which have been pushing accuracy benchmarks close to 60% AP.
Enhancing YOLOX with instance segmentation features to compete with advanced models that leverage extensive context and instance mask annotations.

Conclusion

The paper "YOLOX: Exceeding YOLO Series in 2021" presents substantive improvements to the YOLO family of detectors, achieving state-of-the-art performance with a new anchor-free design, decoupled head, and SimOTA label assignment strategy. Through rigorous empirical evaluation, YOLOX demonstrates significant advancements in both speed and accuracy, positioning it as a leading model for real-time object detection tasks. The research provides a robust foundation for future developments in object detection and sets a new benchmark for practical, high-performance detection systems.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Zheng Ge (60 papers)
Songtao Liu (34 papers)
Feng Wang (408 papers)
Zeming Li (53 papers)
Jian Sun (414 papers)

Citations (3,424)

View on Semantic Scholar