SinMPI: Novel View Synthesis from a Single Image with Expanded Multiplane Images

Published 18 Dec 2023 in cs.CV | (2312.11037v1)

Abstract: Single-image novel view synthesis is a challenging and ongoing problem that aims to generate an infinite number of consistent views from a single input image. Although significant efforts have been made to advance the quality of generated novel views, less attention has been paid to the expansion of the underlying scene representation, which is crucial to the generation of realistic novel view images. This paper proposes SinMPI, a novel method that uses an expanded multiplane image (MPI) as the 3D scene representation to significantly expand the perspective range of MPI and generate high-quality novel views from a large multiplane space. The key idea of our method is to use Stable Diffusion to generate out-of-view contents, project all scene contents into an expanded multiplane image according to depths predicted by monocular depth estimators, and then optimize the multiplane image under the supervision of pseudo multi-view data generated by a depth-aware warping and inpainting module. Both qualitative and quantitative experiments have been conducted to validate the superiority of our method to the state of the art. Our code and data are available at https://github.com/TrickyGo/SinMPI.

Abstract PDF HTML Upgrade to Chat

Authors (3)

References (47)

Citations (6)

View on Semantic Scholar

Summary

The paper introduces SinMPI, a method that expands multiplane representations to synthesize unlimited 3D-consistent novel views from a single image.
It leverages outpainting with Stable Diffusion and monocular depth estimation to overcome depth discretization and texture artifacts.
Experimental results demonstrate superior realism and flexible viewpoint generation compared to state-of-the-art MPI-based methods.

Introduction to SinMPI

The paper addresses the challenge of novel view synthesis from a single image, which is the generation of an infinite number of consistent views from one input image. A new method called SinMPI (Single Image with Expanded Multiplane Image) is proposed to expand the scene representation significantly, thus allowing a broader range of camera perspectives.

Scene Representation with SinMPI

At the core of SinMPI is an enhanced multiplane image (MPI) representation that expands the perspective range beyond the input image's original scene. This advancement is crucial for creating realistic and consistent novel views. Previous MPI-based methods were limited to the original camera frustum, leading to depth discretization and repeated texture artifacts. The new approach in SinMPI overcomes these issues by representing the expanded 3D scene as learnable parameters optimized through volume rendering.

Technique and Pipeline

The methodology used in SinMPI can be described in the following stages:

Outpainting: An image generator based on Stable Diffusion is employed to create out-of-view contents, extending the visual scene information beyond the single available view.
Depth Prediction: The extended scene content and the original input are assigned depth values using monocular depth estimators.
Pseudo Multi-view Generation: The depth-aware warping and inpainting module generates additional views by projecting the overall scene into an expanded multiplane image.
Optimization: The expanded MPI is refined under the guidance of these pseudo multi-view data. The learnable MPI parameters enable efficient handling of the complex scenes and improve the rendition of novel viewpoints.

Experimental Results

The paper carries out both qualitative and quantitative experiments on multiple datasets, demonstrating the method's superiority in generating 3D-consistent novel views. The enhanced approach results in a substantial improvement over the existing state-of-the-art methods in terms of realism and expansion capabilities.

Conclusion and Future Work

SinMPI presents a significant step forward for single-image novel view synthesis by allowing for unrestricted observations and fast 3D-consistent processing. While it pushes the boundaries of scene expansion and render quality, it acknowledges limitations like reliance on depth estimate accuracy and the challenge in reproducing light effects such as specular reflections. The authors suggest potential for future enhancements, particularly in addressing these limitations and incorporating additional realistic view-dependent effects.

Markdown Report Issue