Segment Anything in 3D with Radiance Fields (2304.12308v5)

Published 24 Apr 2023 in cs.CV

Abstract: The Segment Anything Model (SAM) emerges as a powerful vision foundation model to generate high-quality 2D segmentation results. This paper aims to generalize SAM to segment 3D objects. Rather than replicating the data acquisition and annotation procedure which is costly in 3D, we design an efficient solution, leveraging the radiance field as a cheap and off-the-shelf prior that connects multi-view 2D images to the 3D space. We refer to the proposed solution as SA3D, short for Segment Anything in 3D. With SA3D, the user is only required to provide a 2D segmentation prompt (e.g., rough points) for the target object in a single view, which is used to generate its corresponding 2D mask with SAM. Next, SA3D alternately performs mask inverse rendering and cross-view self-prompting across various views to iteratively refine the 3D mask of the target object. For one view, mask inverse rendering projects the 2D mask obtained by SAM into the 3D space with guidance of the density distribution learned by the radiance field for 3D mask refinement; Then, cross-view self-prompting extracts reliable prompts automatically as the input to SAM from the rendered 2D mask of the inaccurate 3D mask for a new view. We show in experiments that SA3D adapts to various scenes and achieves 3D segmentation within seconds. Our research reveals a potential methodology to lift the ability of a 2D segmentation model to 3D. Our code is available at https://github.com/Jumpat/SegmentAnythingin3D.

References (75)

Authors (8)

Jiazhong Cen (6 papers)
Zanwei Zhou (6 papers)
Jiemin Fang (33 papers)
Chen Yang (193 papers)
Wei Shen (181 papers)
Lingxi Xie (137 papers)
Xiaopeng Zhang (100 papers)
Qi Tian (314 papers)

Citations (39)

View on Semantic Scholar

Summary

The paper introduces SA3D, a novel framework that extends 2D segmentation via NeRFs to generate accurate 3D masks with iterative refinement.
It employs manual prompts, mask inverse rendering, and cross-view self-prompting to seamlessly bridge 2D and 3D segmentation.
Experiments on Replica, NVOS, and SPIn-NeRF show an mIoU improvement of over 6.5% compared to state-of-the-art methods.

Essay: Segment Anything in 3D with NeRFs

The paper "Segment Anything in 3D with NeRFs" systematically explores the extension of the Segment Anything Model (SAM) into the three-dimensional (3D) domain using Neural Radiance Fields (NeRFs). This research targets a significant gap in the literature concerning the translation of 2D segmentation capabilities into 3D space, leveraging computational efficiencies and avoided expenses typically associated with direct 3D data annotations.

Methodological Overview

The core proposition of this work is a novel framework named SA3D, which integrates SAM's 2D segmentation abilities with a 3D perception facilitated by NeRFs. This approach circumvents the need for ground-up 3D dataset creation. SA3D operates by employing manual segmentation prompts on a single rendered view, generating initial 2D masks through SAM, and applying an iterative process to extrapolate a complete 3D mask. This iterative method includes mask inverse rendering and cross-view self-prompting phases, using neural radiance densities to project 2D masks into 3D voxel grids and extract prompts from other perspectives to refine and extend the 3D segmentation mask.

Empirical Evaluation

The authors conducted experiments on well-established datasets such as Replica, NVOS, and SPIn-NeRF. The results indicate a notable increase in segmentation accuracy and efficiency, with the method requiring mere minutes to achieve 3D object segmentation. On the NVOS dataset, SA3D achieved an mIoU improvement over the contemporary state-of-the-art of more than 6.5%. These results underscore the effectiveness of using NeRFs as a bridging tool between 2D and 3D segmentation paradigms.

Implications and Future Directions

Practically, SA3D provides a streamlined path to 3D segmentation without the substantial overhead of direct 3D data annotation. This advancement could facilitate the development of more efficient 3D modeling applications and comprehensive virtual environments, potentially broadening accessibility and reducing costs.

Theoretically, this research introduces a promising methodology for extending 2D foundational models to 3D spaces using structural priors, provided these models maintain consistent segmentation across multiple views. This insight opens the potential for future innovations, where 2D models can systematically gain 3D capabilities through similar integrative approaches, fostering a new class of versatile vision models.

Broader Impact

The adaptation of SA3D demonstrates tangible possibilities in diverse sectors, including video game development, augmented reality applications, and robotics, where swift and reliable 3D environmental interpretations are crucial. The framework's reliance on inexpensive off-the-shelf algorithms further enhances its practical appeal.

Conclusion

The work compellingly illustrates the effective use of SAM in conjunction with NeRF to resolve the challenges of 3D segmentation. By leveraging the strengths of modern neural architectures, the authors provide a crucial link between 2D image understanding and 3D spatial awareness. As research continues to build on these foundations, the implications for AI advancements in multidimensional vision models offer an expansive field of paper with significant practical potential.

PDF Markdown

GitHub

GitHub - Jumpat/SegmentAnythingin3D: Segment Anything in 3D with NeRFs (NeurIPS 2023) (869 stars)