DiffRF: Rendering-Guided 3D Radiance Field Diffusion (2212.01206v2)

Published 2 Dec 2022 in cs.CV

Abstract: We introduce DiffRF, a novel approach for 3D radiance field synthesis based on denoising diffusion probabilistic models. While existing diffusion-based methods operate on images, latent codes, or point cloud data, we are the first to directly generate volumetric radiance fields. To this end, we propose a 3D denoising model which directly operates on an explicit voxel grid representation. However, as radiance fields generated from a set of posed images can be ambiguous and contain artifacts, obtaining ground truth radiance field samples is non-trivial. We address this challenge by pairing the denoising formulation with a rendering loss, enabling our model to learn a deviated prior that favours good image quality instead of trying to replicate fitting errors like floating artifacts. In contrast to 2D-diffusion models, our model learns multi-view consistent priors, enabling free-view synthesis and accurate shape generation. Compared to 3D GANs, our diffusion-based approach naturally enables conditional generation such as masked completion or single-view 3D synthesis at inference time.

Citations (163)

View on Semantic Scholar

Summary

The paper introduces a diffusion model that directly synthesizes 3D radiance fields, achieving high image and geometry quality.
It employs an explicit voxel grid with rendering loss to learn multi-view consistent priors for free-view synthesis and accurate shape generation.
It significantly outperforms GAN-based methods on PhotoShape Chairs and ABO Tables, reducing FID from 16.54 to 15.95 and MMD from 5.62 to 4.42.

DiffRF: Rendering-Guided 3D Radiance Field Diffusion

The paper "DiffRF: Rendering-Guided 3D Radiance Field Diffusion" introduces a novel method for 3D radiance field synthesis using denoising diffusion probabilistic models (DDPMs). Unlike previous approaches that focus on generating images, latent codes, or point clouds, this work directly targets the volumetric radiance fields, significantly enhancing the capabilities of unconditionally and conditionally generating high-fidelity 3D assets.

Methodology and Contributions

DiffRF establishes a 3D denoising model operating directly on an explicit voxel grid representation. This method moves beyond the traditional neural radiance fields (NeRFs) restricted to specific scenes or data, allowing for broader generalization across various object categories. The model is guided by rendering loss, which biases the noise prediction formulation toward synthesizing high-quality images and mitigates artifacts typical in traditional radiance field representations. DiffRF's core advantage lies in its ability to learn multi-view consistent priors, facilitating free-view synthesis and accurate shape generation without requiring 3D supervision.

Key contributions of this paper include:

Presentation of the first diffusion model directly synthesizing 3D radiance fields, achieving high-quality synthesis in terms of both geometry and image fidelity.
Introduction of masked radiance field completion, analogous to image inpainting, achieved without specific training adaptations.
Superior performance in generating geometrically accurate and view-consistent renderings, as demonstrated on datasets such as PhotoShape Chairs and ABO Tables, improving over GAN-based approaches in FID from 16.54 to 15.95 and in MMD from 5.62 to 4.42.

Experimental Results

The paper presents thorough experiments to evaluate DiffRF's performance. Unconditional radiance field synthesis on PhotoShape Chairs and ABO Tables demonstrates that DiffRF achieves better image and geometric quality than state-of-the-art GAN-based methods like EG3D and π-GAN. These results are quantified using FID, KID, Coverage Score, and MMD metrics. The rendering loss, specific to DiffRF, significantly contributes to improved FID scores and image synthesis quality, indicating the effectiveness of integrating rendering guidance within the diffusion process.

Moreover, the task of masked radiance field completion shows DiffRF's ability to generate coherent completions while preserving the integrity of non-masked regions, even at high levels of masking that other methods, like EG3D, struggle with.

Implications and Future Directions

DiffRF demonstrates a significant advancement in generating 3D representations, pushing boundaries in fields such as AR/VR, gaming, and other applications requiring detailed 3D assets. By proposing a methodology that inherently supports conditional generation at inference time without the need for retraining, the research expands the applicability of diffusion models in modern 3D tasks.

Future developments might focus on overcoming current limitations, such as reducing sampling times through efficient diffusion processes or improving scalability to larger grid resolutions by leveraging adaptive grid structures or sparse representations.

In conclusion, DiffRF represents a pivotal step toward integrating advanced probabilistic modeling with volumetric rendering, setting a new benchmark for 3D synthesis in contemporary AI research.

PDF Markdown

Related Papers

GitHub

DiffRF: Rendering-guided 3D Radiance Field Diffusion