Object-Centric Stereo Matching for 3D Object Detection (1909.07566v2)

Published 17 Sep 2019 in cs.CV

Abstract: Safe autonomous driving requires reliable 3D object detection-determining the 6 DoF pose and dimensions of objects of interest. Using stereo cameras to solve this task is a cost-effective alternative to the widely used LiDAR sensor. The current state-of-the-art for stereo 3D object detection takes the existing PSMNet stereo matching network, with no modifications, and converts the estimated disparities into a 3D point cloud, and feeds this point cloud into a LiDAR-based 3D object detector. The issue with existing stereo matching networks is that they are designed for disparity estimation, not 3D object detection; the shape and accuracy of object point clouds are not the focus. Stereo matching networks commonly suffer from inaccurate depth estimates at object boundaries, which we define as streaking, because background and foreground points are jointly estimated. Existing networks also penalize disparity instead of the estimated position of object point clouds in their loss functions. We propose a novel 2D box association and object-centric stereo matching method that only estimates the disparities of the objects of interest to address these two issues. Our method achieves state-of-the-art results on the KITTI 3D and BEV benchmarks.

Citations (83)

View on Semantic Scholar

Summary

The paper’s main contribution is a novel object-centric stereo matching method that refines disparity estimation by focusing solely on object regions.
OC Stereo employs 2D box association with SSIM and RoI alignments to efficiently reduce background noise and improve depth accuracy.
Empirical results on KITTI benchmarks show state-of-the-art performance in object detection and localization, promoting faster, cost-effective autonomous driving systems.

Object-Centric Stereo Matching for 3D Object Detection

The paper "Object-Centric Stereo Matching for 3D Object Detection" presents a novel approach to stereo 3D object detection, aiming to enhance perceptual systems for autonomous driving by improving depth estimation with stereo cameras. As stereo setups are a cost-effective alternative to LiDAR sensors, this work is pivotal in making autonomous navigation more accessible. The proposed method overcomes several limitations of existing stereo matching solutions by focusing specifically on object-centric disparity estimation, thereby addressing streaking artifacts and optimization bottlenecks common in prior work.

Summary of Methodology

The authors introduce an object-centric stereo network model, OC Stereo, tailored to detect only objects of interest in stereo images. This method departs from traditional stereo approaches that aim for generic disparity estimation across entire scenes. Instead, OC Stereo concentrates computational resources on regions containing objects, effectively filtering out background noise that compromises depth accuracy. Key elements of the methodology include:

2D Box Association: The approach begins with detecting 2D bounding boxes in both left and right stereo images, followed by an efficient association algorithm using the Structural Similarity Index (SSIM). This ensures reliable pairing of corresponding object detections across stereo images.
Object-Centric Stereo Matching: Stereo matching is refined by utilizing Region of Interest (RoI) alignments and focusing exclusively on object pixels, thereby suppressing depth estimation errors at object boundaries. This is achieved through a disparity cost volume formulation that maintains a smaller scale compared to global disparity estimations, leading to more accurate and computationally efficient depth predictions.
Point Cloud Loss: A specialized loss function directly penalizes discrepancies in object shape and position within point clouds, countering the typical bias towards closer objects due to disparity-depth inverse proportionality.

Empirical Results and Implications

The system demonstrates state-of-the-art performance on the KITTI benchmark for 3D object detection and BEV (Bird's Eye View) metrics. Notably, it achieves superior accuracy in both car detection and localization across various IoU thresholds compared to previous stereo detectors like Pseudo-LiDAR. The method also runs significantly faster, which is crucial for real-time applications in autonomous driving.

The implications of this research extend beyond the immediate improvements in detecting and localizing objects in 3D spaces. By promoting stereo cameras as viable alternatives to expensive LiDAR systems, the work supports broader deployment of autonomous vehicle technologies. Practically, this can lead to reduced costs and increased adoption rates. Theoretically, the paper challenges researchers to further refine object-centric approaches by exploring hybrid sensor solutions and enhancing contextual understanding through advanced neural architectures.

Future Directions

While the paper sets a robust foundation for object-centric stereo matching, future developments could involve integrating more complex neural networks to enrich object recognition capabilities, exploring methods for even finer-grained object segmentation, and potentially merging stereo data with other sensory inputs for comprehensive environmental understanding. Moreover, adapting this approach to varied atmospheric and lighting conditions could vastly improve its robustness and adaptability in diverse scenarios.

In conclusion, "Object-Centric Stereo Matching for 3D Object Detection" presents a sophisticated and highly efficient approach to stereo-based 3D detection, marking significant progress in making autonomous vehicle technology more practical and widespread. This research is a crucial step towards achieving more resource-efficient and accurate perception systems in autonomous driving.

PDF Markdown

Related Papers

YouTube

Show All Videos