Workshop on Autonomous Driving at CVPR 2021: Technical Report for Streaming Perception Challenge (2108.04230v1)

Published 27 Jul 2021 in cs.CV and cs.AI

Abstract: In this report, we introduce our real-time 2D object detection system for the realistic autonomous driving scenario. Our detector is built on a newly designed YOLO model, called YOLOX. On the Argoverse-HD dataset, our system achieves 41.0 streaming AP, which surpassed second place by 7.8/6.1 on detection-only track/fully track, respectively. Moreover, equipped with TensorRT, our model achieves the 30FPS inference speed with a high-resolution input size (e.g., 1440-2304). Code and models will be available at https://github.com/Megvii-BaseDetection/YOLOX

PDF Abstract

An Evaluation of YOLOX for Real-Time Object Detection in Autonomous Driving

The paper presents a detailed investigation into a novel 2D object detection system optimized for real-time application in autonomous driving scenarios. Building upon existing models in the YOLO series, the authors introduce YOLOX, a system designed to balance detection accuracy and inference speed efficiently.

Methodological Advancements

YOLOX integrates several enhancements over its predecessors, notably YOLOv4 and YOLOv5. The system adopts advanced data augmentation strategies such as mosaic and mixup, which contribute to improved generalization capabilities. A significant departure from previous models is the shift to an anchor-free detection head, alongside the utilization of a simplified version of the Optimal Transport Assignment (OTA) for label assignment. These changes result in a model that is not only more straightforward but also more streamlined, effectively reducing the need for fine-tuning hyperparameters such as anchor shapes and layer loss weights.

Model Architecture and Inference Optimization

The architecture leverages the C3 backbone similar to YOLOv5-L-P6, further simplifying the model structure. For inference, YOLOX employs TensorRT to optimize model deployment, achieving a high inference speed. By integrating image pre-processing and post-processing operations into a single function call, the system enhances the inferencing efficiency, aligning with real-time application requirements.

Experimental Results

The experimental evaluation on the Argoverse-HD dataset demonstrates the system's effectiveness, achieving a streaming Average Precision (AP) of 41.0, outperforming the nearest competitor by considerable margins of 7.8 and 6.1 on the detection-only and fully-tracked metrics, respectively. Additional experiments conducted include pre-training on the COCO dataset and fine-tuning using multiple datasets, including BDD100K, Cityscapes, and nuScenes. These datasets provide diverse driving scenarios, which enhance the model’s robustness and capacity to generalize.

The paper reports a detailed performance analysis across various dimensions, emphasizing the trade-offs between accuracy and latency. The model achieves a throughput of 30 FPS with high-resolution inputs, facilitating prompt decision-making in autonomous vehicles.

Implications and Future Prospects

YOLOX's advancements herald significant implications for the domain of autonomous driving. The increased accuracy and reduced inference time enhance the feasibility of deploying such models in real-world environments where rapid response and high reliability are critical. The research suggests potential directions for future exploration, particularly in refining label assignment strategies and optimizing further for diverse environmental conditions.

Looking forward, continued exploration of anchor-free methods and augmentation strategies will be pivotal in advancing real-time object detection technologies. The ongoing development of more intricate label assignment strategies and the integration of additional contextual information could further elevate performance. As computational power and algorithmic sophistication continue to evolve, the practical application of systems like YOLOX will likely expand, contributing substantially to the safety and efficiency of autonomous driving systems.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Songyang Zhang (116 papers)
Lin Song (44 papers)
Songtao Liu (34 papers)
Zheng Ge (60 papers)
Zeming Li (53 papers)
Xuming He (109 papers)
Jian Sun (414 papers)

Citations (9)

View on Semantic Scholar

Related Papers

YOLOX: Exceeding YOLO Series in 2021 (2021)
YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications (2022)
PP-YOLOE: An evolved version of YOLO (2022)
Real-time Object Detection for Streaming Perception (2022)
EdgeYOLO: An Edge-Real-Time Object Detector (2023)

Find Related Papers

GitHub

GitHub - Megvii-BaseDetection/YOLOX: YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/ (9,534 stars)