FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection (2104.10956v3)

Published 22 Apr 2021 in cs.CV, cs.AI, and cs.RO

Abstract: Monocular 3D object detection is an important task for autonomous driving considering its advantage of low cost. It is much more challenging than conventional 2D cases due to its inherent ill-posed property, which is mainly reflected in the lack of depth information. Recent progress on 2D detection offers opportunities to better solving this problem. However, it is non-trivial to make a general adapted 2D detector work in this 3D task. In this paper, we study this problem with a practice built on a fully convolutional single-stage detector and propose a general framework FCOS3D. Specifically, we first transform the commonly defined 7-DoF 3D targets to the image domain and decouple them as 2D and 3D attributes. Then the objects are distributed to different feature levels with consideration of their 2D scales and assigned only according to the projected 3D-center for the training procedure. Furthermore, the center-ness is redefined with a 2D Gaussian distribution based on the 3D-center to fit the 3D target formulation. All of these make this framework simple yet effective, getting rid of any 2D detection or 2D-3D correspondence priors. Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020. Code and models are released at https://github.com/open-mmlab/mmdetection3d.

Citations (528)

View on Semantic Scholar

Summary

The paper introduces a fully convolutional one-stage framework that decouples 3D targets into 2D and 3D components for enhanced detection accuracy.
It leverages multi-scale target assignment and a redefined center-ness measure to achieve significant improvements in mAP and NDS on the nuScenes benchmark.
The method demonstrates practical viability for autonomous driving by eliminating LiDAR requirements while maintaining competitive 3D spatial perception.

Overview of FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection

In the domain of computer vision, monocular 3D object detection stands as a crucial task, particularly for applications like autonomous driving where cost-effective solutions are necessary. The paper "FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection," authored by Tai Wang et al., proposes a novel framework named FCOS3D. This framework capitalizes on advances in 2D detection methods to tackle the challenges inherent in 3D detection using only monocular images, eliminating the need for expensive LiDAR systems.

Methodology

The paper introduces a fully convolutional single-stage architecture inspired by the anchor-free approach of FCOS. The framework involves transforming the standard 7-DoF (Degrees of Freedom) 3D targets into components that can be processed in the 2D image domain, effectively decoupling the targets into 2D and 3D attributes. This separation facilitates handling projections from the 3D space into the 2D plane.

Key innovations include:

Target Decoupling and Assignment: The approach distributes objects to various feature levels by considering their 2D pixel-scale sizes. This enables effective multi-scale feature extraction similar to architectures like FPN.
Redefined Center-ness: The use of a 2D Gaussian distribution centered on the projected 3D-center point addresses the adaptation to the 3D representation. This redefinition aligns with the projection-based formulation and aids in suppressing low-quality detections.

The framework is designed to be simple yet effective, eschewing complex priors related to 2D detection or 2D-3D correspondence.

Experimental Results

The FCOS3D method demonstrates competitive performance on the nuScenes benchmark, achieving first place among vision-only methods in the 2020 NuScenes 3D Detection Challenge. Specifically, it records an mAP of 0.358 and an NDS of 0.428, surpassing several existing methods that rely solely on camera data. This underscores the framework's efficacy in leveraging monocular inputs to approximate the nuanced 3D spatial understanding typically augmented by additional sensors like LiDAR.

Numerical and Comparative Insights

Key metrics such as mAOE and mATE show substantial improvements over other monocular methods, highlighting superior orientation and spatial translation predictions. The paper provides a thorough ablation paper revealing the significance of innovations like distance-based target assignment and disentangled regression heads, each enhancing prediction precision without substantially increasing computational overhead.

Implications and Future Directions

This research presents an efficient pathway for monocular 3D detection, suggesting potential for deployment in cost-sensitive scenarios where LiDAR is impractical. The framework's reliance on enhanced feature extraction techniques and intelligent target assignment could inspire further iterations of monocular detection systems.

Future work might explore integrating temporal cues, enhancing depth estimation accuracy, or utilizing multi-camera setups for a more holistic environmental understanding, confronting inherent challenges such as occlusion and depth ambiguity.

The FCOS3D framework's ability to achieve impressive results demonstrates the potential of 2D detection advancements in addressing challenges in 3D object perception, marking a valuable contribution to the field of autonomous navigation and robotic vision.

PDF Markdown

Related Papers

GitHub

GitHub - open-mmlab/mmdetection3d: OpenMMLab's next-generation platform for general 3D object detection. (5,214 stars)