BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection (2303.08498v2)

Published 15 Mar 2023 in cs.CV

Abstract: While most recent autonomous driving system focuses on developing perception methods on ego-vehicle sensors, people tend to overlook an alternative approach to leverage intelligent roadside cameras to extend the perception ability beyond the visual range. We discover that the state-of-the-art vision-centric bird's eye view detection methods have inferior performances on roadside cameras. This is because these methods mainly focus on recovering the depth regarding the camera center, where the depth difference between the car and the ground quickly shrinks while the distance increases. In this paper, we propose a simple yet effective approach, dubbed BEVHeight, to address this issue. In essence, instead of predicting the pixel-wise depth, we regress the height to the ground to achieve a distance-agnostic formulation to ease the optimization process of camera-only perception methods. On popular 3D detection benchmarks of roadside cameras, our method surpasses all previous vision-centric methods by a significant margin. The code is available at {\url{https://github.com/ADLab-AutoDrive/BEVHeight}}.

Citations (54)

View on Semantic Scholar

Summary

The paper introduces BEVHeight, replacing per-pixel depth with height estimation for improved roadside 3D object detection.
It achieves a 4.85% accuracy gain in standard setups and over 26.88% under noisy camera conditions.
The framework employs dynamic discretization of height, enhancing robustness against extrinsic camera disturbances.

Overview of BEVHeight: A Framework for Vision-Based Roadside 3D Object Detection

The paper "BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection" presents a novel approach in the domain of autonomous vehicle perception systems, specifically highlighting the limitations in traditional depth estimation methods when applied to roadside camera setups. This research addresses a significant challenge in the field of 3D object detection: the degradation of detection accuracy in roadside settings due to depth estimation issues. Instead, the authors propose a height-based method that enhances both accuracy and robustness, especially under extrinsic camera disturbances.

Key Contributions and Results

The researchers introduce BEVHeight, a framework optimizing Bird's Eye View (BEV) perception for scenarios involving roadside cameras. The primary innovation lies in replacing the per-pixel depth estimation with height estimation from the ground. This approach benefits from the centralized distribution of height, which remains unchanged irrespective of object distance, contrary to depth estimation. This change simplifies the prediction task and provides several advantages:

Improved Accuracy: BEVHeight surpasses existing methods such as BEVDepth in terms of accuracy. For instance, the framework achieves a 4.85% improvement in standard settings and over 26.88% under noisy settings with disturbed camera parameters. It also outperforms LiDAR-augmented methods, highlighting the robustness of the height-based approach.
Robustness Against Disturbance: Height estimation is demonstrated to be more resilient to changes in camera pose and extrinsic parameters, which are common in real-world roadside conditions due to maintenance or environmental factors.
Dynamic Discretization: The paper employs a dynamic discretization method for height estimation, which adjusts the granularity of height bins, thereby optimizing the prediction process and effectively reducing the error margin compared to uniform discretization.
Comparative Evaluation: Extensive experiments conducted on prominent benchmarks like DAIR-V2X-I and Rope3D indicate that BEVHeight significantly outperforms vision-centric methods that rely on depth, thereby validating its efficacy in roadside perception.

Theoretical and Practical Implications

BEVHeight's deviation from depth-centric methods has notable theoretical implications for the field. The concept of leveraging height instead of depth in perception frameworks paves the way for future research exploring alternative methods to represent 3D space, especially in scenarios where the limitations of depth become evident.

Practically, the framework could facilitate more reliable deployment of autonomous driving technologies by improving detection robustness in cluttered and dynamic roadside environments. This robustness is crucial for ensuring safety and reliability in autonomous systems, which often operate under varying environmental conditions.

Speculation on Future Developments

This research opens several avenues for further exploration. One potential direction is the integration of height and depth predictions, leveraging their complementary advantages to enhance prediction accuracy further. Additionally, BEVHeight could inspire innovations in sensor fusion techniques, combining height-based predictions with data from other sensory inputs like LiDAR to craft a more holistic perception model.

Moreover, the proposed framework could benefit from advancements in machine learning architectures, such as incorporating transformer-based models, to further optimize the processing of visual data from roadside cameras.

In summary, the paper presents a substantial advancement in vision-based 3D object detection in roadside scenarios. BEVHeight's capacity to transcend the limitations posed by depth predictions and its robust performance under variable conditions highlight its potential to redefine approaches within the autonomous driving domain.

PDF Markdown

Related Papers

GitHub

GitHub - ADLab-AutoDrive/BEVHeight: An official code release of our CVPR'23 paper, BEVHeight (179 stars)