YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications (2209.02976v1)

Published 7 Sep 2022 in cs.CV

Abstract: For years, the YOLO series has been the de facto industry-level standard for efficient object detection. The YOLO community has prospered overwhelmingly to enrich its use in a multitude of hardware platforms and abundant scenarios. In this technical report, we strive to push its limits to the next level, stepping forward with an unwavering mindset for industry application. Considering the diverse requirements for speed and accuracy in the real environment, we extensively examine the up-to-date object detection advancements either from industry or academia. Specifically, we heavily assimilate ideas from recent network design, training strategies, testing techniques, quantization, and optimization methods. On top of this, we integrate our thoughts and practice to build a suite of deployment-ready networks at various scales to accommodate diversified use cases. With the generous permission of YOLO authors, we name it YOLOv6. We also express our warm welcome to users and contributors for further enhancement. For a glimpse of performance, our YOLOv6-N hits 35.9% AP on the COCO dataset at a throughput of 1234 FPS on an NVIDIA Tesla T4 GPU. YOLOv6-S strikes 43.5% AP at 495 FPS, outperforming other mainstream detectors at the same scale~(YOLOv5-S, YOLOX-S, and PPYOLOE-S). Our quantized version of YOLOv6-S even brings a new state-of-the-art 43.3% AP at 869 FPS. Furthermore, YOLOv6-M/L also achieves better accuracy performance (i.e., 49.5%/52.3%) than other detectors with a similar inference speed. We carefully conducted experiments to validate the effectiveness of each component. Our code is made available at https://github.com/meituan/YOLOv6.

PDF Abstract

YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications – An Expert Summary

The YOLO series of object detection frameworks have long stood as the industry standard, balancing efficiency and accuracy. Released by researchers from Meituan Inc., the latest iteration, YOLOv6, pushes the limits further for industrial deployment scenarios. This essay outlines and scrutinizes the key features, methodology, and implications of their contributions.

YOLOv6's Core Contributions

YOLOv6 brings notable advancements in terms of network design, training strategies, and deployment optimizations:

Network Design:
- Backbones: YOLOv6 incorporates a tailored approach using RepVGG-based backbones, particularly effective for small models. For larger models, the CSPStackRep Block is introduced to balance computational cost and parameter growth.
- Neck and Head: The architecture adopts a Rep-PAN topology and an efficient decoupled head that integrates classification and regression tasks using a hybrid-channel strategy. This improves both accuracy and speed through higher parallelism and lower memory footprint.
Label Assignment:
- The researchers adopted Task-Aligned Learning (TAL) for label assignment. TAL provides a performance and stability boost over earlier mechanisms like SimOTA and ATSS by addressing task misalignment in classification and localization.
Loss Functions:
- The paper implements Varifocal Loss (VFL) for classification and integrates Distance Intersection-over-Union (DIoU) for regression in smaller models and GIoU/SIoU in larger models. Distribution Focal Loss (DFL) is employed selectively to improve box regression without hampering inference speed.
Industry-ready Enhancements:
- Techniques such as self-distillation, meticulous quantization methods, and training schedules up to 400 epochs are employed to reach optimal convergence. These improvements show considerable performance gains while ensuring compatibility with hardware constraints.
Quantization:
- RepOptimizer and Quantization-Aware Training (QAT) strategies are utilized to mitigate performance loss in quantized models. Sensitivity analysis further refines quantization, optimizing the deployment for low-power GPUs.

Numerical Performance Insights

The empirical results on the COCO 2017 dataset validate YOLOv6’s superiority over its predecessors and competitors:

YOLOv6-N hits 35.9% Average Precision (AP) at 1234 FPS on a Tesla T4 GPU.
YOLOv6-S achieves 43.5% AP at 495 FPS, surpassing YOLOv5-S, YOLOX-S, and PPYOLOE-S in both speed and accuracy.
The quantized version of YOLOv6-S delivers 43.3% AP at 869 FPS, setting a new benchmark for performance.

The larger models, YOLOv6-M and YOLOv6-L, achieve 49.5% and 52.3% AP respectively, with substantial inference speed, positioning them above most available one-stage detectors.

Practical and Theoretical Implications

Practically, YOLOv6’s implementation promises enhanced performance in real-time object detection across diverse industrial applications. The flexible scaling of the architecture allows deployment across various hardware platforms from low-power edge devices to high-end GPUs. The robust quantization scheme extends its usability in resource-constrained environments without significant trade-offs in accuracy.

Theoretically, the integration of novel components such as TAL and RepOptimizer reinforces the framework’s ability to leverage cutting-edge advancements while ensuring practical feasibility. Further, the emphasis on self-distillation aligns with the ongoing research trends in enhancing model efficiency collaboratively.

Future Directions

Future work may involve exploring more complex ensemble techniques within the YOLOv6 architecture to push detection capabilities further. Additionally, expanding its applicability through continuous learning and domain adaptation can make YOLOv6 even more resilient to diverse real-world scenarios. The potential integration with emerging hardware accelerators could also be a noteworthy direction, providing specialized optimizations for even faster inference speeds.

In conclusion, YOLOv6 fortifies the YOLO series legacy by marrying academia's latest research developments with stringent industrial application requirements. It stands as a testimony to the balance of research innovation and real-world applicability in the field of object detection.

PDF Markdown Bookmark Chat (Pro)

Authors (18)

Chuyi Li (3 papers)
Lulu Li (10 papers)
Hongliang Jiang (34 papers)
Kaiheng Weng (2 papers)
Yifei Geng (6 papers)
Liang Li (297 papers)
Zaidan Ke (2 papers)
Qingyuan Li (11 papers)
Meng Cheng (109 papers)
Weiqiang Nie (1 paper)
Yiduo Li (7 papers)
Bo Zhang (633 papers)
Yufei Liang (3 papers)
Linyuan Zhou (2 papers)
Xiaoming Xu (18 papers)
Xiangxiang Chu (62 papers)
Xiaoming Wei (44 papers)
Xiaolin Wei (42 papers)

Citations (1,263)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - meituan/YOLOv6: YOLOv6: a single-stage object detection framework dedicated to industrial applications. (5,584 stars)