Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception (2301.07870v1)

Published 19 Jan 2023 in cs.CV

Abstract: Recently, the pure camera-based Bird's-Eye-View (BEV) perception removes expensive Lidar sensors, making it a feasible solution for economical autonomous driving. However, most existing BEV solutions either suffer from modest performance or require considerable resources to execute on-vehicle inference. This paper proposes a simple yet effective framework, termed Fast-BEV, which is capable of performing real-time BEV perception on the on-vehicle chips. Towards this goal, we first empirically find that the BEV representation can be sufficiently powerful without expensive view transformation or depth representation. Starting from M2BEV baseline, we further introduce (1) a strong data augmentation strategy for both image and BEV space to avoid over-fitting (2) a multi-frame feature fusion mechanism to leverage the temporal information (3) an optimized deployment-friendly view transformation to speed up the inference. Through experiments, we show Fast-BEV model family achieves considerable accuracy and efficiency on edge. In particular, our M1 model (R18@256x704) can run over 50FPS on the Tesla T4 platform, with 47.0% NDS on the nuScenes validation set. Our largest model (R101@900x1600) establishes a new state-of-the-art 53.5% NDS on the nuScenes validation set. The code is released at: https://github.com/Sense-GVT/Fast-BEV.

Authors (10)

Bin Huang (56 papers)
Yangguang Li (44 papers)
Enze Xie (84 papers)
Feng Liang (61 papers)
Luya Wang (13 papers)
Mingzhu Shen (14 papers)
Fenggang Liu (8 papers)
Tianqi Wang (43 papers)
Ping Luo (340 papers)
Jing Shao (109 papers)

Citations (18)

View on Semantic Scholar

Summary

Fast-BEV: Towards Real-time On-vehicle Bird’s-Eye View Perception

The paper addresses the development of an efficient Bird’s-Eye View (BEV) perception system for autonomous vehicles, emphasizing real-time performance on on-vehicle hardware. Current BEV solutions relying on lidar sensors pose cost and deployment challenges, prompting the investigation of pure camera-based systems. The authors introduce a BEV framework named Fast-BEV, which achieves high accuracy and efficiency on edge platforms without the computational expense typically associated with view transformation or detailed depth representation.

Key Contributions

The Fast-BEV framework stems from the principles of the M $^2$ BEV baseline, adopting the assumption of uniform depth distribution in the view transformation process. Fast-BEV enhances this baseline with the following components:

Augmentation Strategies: Employs comprehensive data augmentation in both image and BEV spaces to reduce overfitting. Techniques include random flips, rotations, and spatial transformations, which are integrated into the training pipeline to enhance model robustness.
Temporal Feature Fusion: Incorporates multi-frame data to leverage temporal information. By integrating features from past frames, Fast-BEV significantly improves the model's capacity to handle dynamic changes in the environment, thereby enhancing 3D perception accuracy.
Optimized View Transformation: Reduces latency in the view transformation process, which is a major computational bottleneck. The proposed approach involves pre-computing the projection index and adopting a dense projection methodology where all camera views contribute to a single voxel, thereby avoiding expensive voxel aggregations.

Experimental Results

The paper showcases the performance of Fast-BEV on the nuScenes dataset. The model exhibits strong numerical results, achieving 46.9% NDS with the M1 model on a Tesla T4 platform, at over 50 FPS. The largest configuration of Fast-BEV establishes a new state-of-the-art at 53.5% NDS. These figures underscore the model's capability to balance performance and computational efficiency, making it apt for real-time deployment.

Implications and Future Directions

Practically, Fast-BEV presents a favorable solution for real-time autonomous driving applications, given its enhanced deployment capability on resource-constrained edge devices. Theoretically, it shifts the paradigm by demonstrating that efficient BEV perception can be achieved without relying on costly lidar or depth-based methods.

Future developments might explore further optimization and deployment strategies, possibly incorporating adaptive mechanisms to handle varying environmental conditions dynamically. Additionally, expanding Fast-BEV's architecture to integrate with other sensory modalities could provide holistic processing frameworks for autonomous systems, advancing both practical deployment and fundamental research in AI-driven autonomous perception.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - Sense-GVT/Fast-BEV: Fast-BEV: A Fast and Strong Bird’s-Eye View Perception Baseline (615 stars)