STD: Sparse-to-Dense 3D Object Detector for Point Cloud (1907.10471v1)

Published 22 Jul 2019 in cs.CV

Abstract: We present a new two-stage 3D object detection framework, named sparse-to-dense 3D Object Detector (STD). The first stage is a bottom-up proposal generation network that uses raw point cloud as input to generate accurate proposals by seeding each point with a new spherical anchor. It achieves a high recall with less computation compared with prior works. Then, PointsPool is applied for generating proposal features by transforming their interior point features from sparse expression to compact representation, which saves even more computation time. In box prediction, which is the second stage, we implement a parallel intersection-over-union (IoU) branch to increase awareness of localization accuracy, resulting in further improved performance. We conduct experiments on KITTI dataset, and evaluate our method in terms of 3D object and Bird's Eye View (BEV) detection. Our method outperforms other state-of-the-arts by a large margin, especially on the hard set, with inference speed more than 10 FPS.

Citations (691)

View on Semantic Scholar

Summary

The paper introduces a two-stage framework that uses spherical anchor proposals and a novel PointsPool layer to boost 3D detection accuracy.
It employs a PointNet++ backbone to extract semantic features and achieves over 10 FPS on the KITTI benchmark.
The addition of an IoU estimation branch aligns classification with localization, improving performance in occluded and dense scenarios.

Sparse-to-Dense 3D Object Detector for Point Cloud

The paper presents a two-stage framework titled "Sparse-to-Dense 3D Object Detector" (STD), aimed at improving 3D object detection from point cloud data. The framework demonstrates notable advancements in computational efficiency and detection accuracy through innovative techniques in proposal generation and feature extraction.

Framework Overview

STD employs a two-stage process:

Proposal Generation: The first stage involves generating proposals using a bottom-up approach. By using raw point cloud data, the model seeds each point with a spherical anchor to create proposals that maintain high recall with reduced computational effort. This stage utilizes a PointNet++ backbone for feature extraction, ensuring that each point's semantic context is effectively captured. A key innovation here is the PointsPool layer, which transforms sparse point representations into a compact format, further optimizing computational performance.
Box Prediction: In the second stage, the framework incorporates a parallel intersection-over-union (IoU) prediction branch. This addition augments the network's ability to align classification scores with localization accuracy, thus improving overall detection quality.

Experimental Results

The authors validate their approach through experiments on the KITTI dataset, achieving state-of-the-art results in both 3D object detection and Bird's Eye View (BEV) detection. Notably, STD shows significant performance gains in challenging scenarios characterized by high occlusion and object density, delivering more than 10 frames per second (FPS) during inference.

Key Contributions

The paper introduces several noteworthy contributions:

Spherical Anchor-Based Proposal Generation: By employing spherical anchors as opposed to traditional cuboidal ones, the framework maximizes the retention of location information during proposal generation. This technique results in a 50% reduction in anchor quantities, improving recall rates.
PointsPool Layer: This component capitalizes on the merits of both point-based and voxel-based methods, transforming unordered point data into structured representations suitable for efficient CNN processing.
3D IoU Estimation Branch: The addition of an IoU estimation branch enables better alignment between classification confidence and localization quality, leading to substantial improvements in precision.

Implications and Future Work

The contributions outlined in the paper have significant implications for practical applications reliant on precise 3D object recognition, such as autonomous driving and augmented reality. The theoretical advancements in point evaluation and feature pooling suggest potential improvements in other 3D vision tasks.

Looking ahead, further exploration into the application of these methods across various contexts and datasets could yield insights into their adaptability and scalability. Advancements could also focus on refining the proposal generation process or exploring alternative methods to enhance the localization capabilities of 3D object detectors.

Conclusion

STD combines innovations in proposal generation and feature extraction to achieve superior 3D object detection in point clouds. By adeptly integrating novel strategies with established techniques, the framework sets a high benchmark in both accuracy and speed, while paving the way for future research in 3D detection methodologies.

PDF Markdown