OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments (2312.09243v3)

Published 14 Dec 2023 in cs.CV

Abstract: Occupancy prediction reconstructs 3D structures of surrounding environments. It provides detailed information for autonomous driving planning and navigation. However, most existing methods heavily rely on the LiDAR point clouds to generate occupancy ground truth, which is not available in the vision-based system. In this paper, we propose an OccNeRF method for training occupancy networks without 3D supervision. Different from previous works which consider a bounded scene, we parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range. The neural rendering is adopted to convert occupancy fields to multi-camera depth maps, supervised by multi-frame photometric consistency. Moreover, for semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model. Extensive experiments for both self-supervised depth estimation and 3D occupancy prediction tasks on nuScenes and SemanticKITTI datasets demonstrate the effectiveness of our method.

PDF HTML Abstract

Introduction to Occupancy Prediction

Occupancy prediction is a critical component of vision-based perception systems, especially in contexts like autonomous driving planning and navigation. These systems aim to reconstruct the 3D structures of environments, which aids in understanding the surrounding area in detail. Traditionally, such systems have depended on LiDAR (Light Detection and Ranging) to gather geometric information, but LiDAR has its limitations, including high costs and sparse data at times.

Self-Supervised Multi-Camera Approach

To overcome the need for LiDAR and make use of abundant image data, this paper introduces OccNeRF, a self-supervised method for multi-camera occupancy prediction. The novelty of OccNeRF lies in its ability to work with unbounded scenes using raw images rather than relying on 3D labels or LiDAR data. It uses a neural radiance field (NeRF) approach to generate occupancy fields and depth maps from multi-camera images, and it focuses on ensuring multi-frame photometric consistency—a method commonly seen in depth estimation tasks.

Advancements in Semantic Occupancy Prediction

For semantic occupancy prediction, which involves understanding the type of objects present and their layouts, the method employs an open-vocabulary segmentation model. This allows it to use existing 2D semantic segmentation data to aid in the 3D occupancy prediction tasks. Remarkably, the model leverages semantic cues to enhance the spatial awareness of the scene reconstruction.

Validation and Potential

OccNeRF's effectiveness is demonstrated through extensive experimentation on the nuScenes dataset, a benchmark for autonomous driving systems. Comparisons on this dataset show that OccNeRF excels in self-supervised depth estimation tasks and achieves notable success in semantic occupancy prediction. It's a step forward in utilizing self-supervised methods for understanding 3D spaces based on image data alone, presenting a less expensive alternative to traditional methods and potentially widening the scope of autonomous systems that can adapt to such technology.

PDF Markdown Bookmark Chat (Pro)

References (91)

Authors (8)

Chubin Zhang (4 papers)
Juncheng Yan (3 papers)
Yi Wei (60 papers)
Jiaxin Li (57 papers)
Li Liu (311 papers)
Yansong Tang (81 papers)
Yueqi Duan (47 papers)
Jiwen Lu (192 papers)

Citations (6)

View on Semantic Scholar

GitHub

GitHub - LinShan-Bin/OccNeRF: Code of "OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields". (336 stars)