MuRF: Multi-Baseline Radiance Fields

Published 7 Dec 2023 in cs.CV | (2312.04565v2)

Abstract: We present Multi-Baseline Radiance Fields (MuRF), a general feed-forward approach to solving sparse view synthesis under multiple different baseline settings (small and large baselines, and different number of input views). To render a target novel view, we discretize the 3D space into planes parallel to the target image plane, and accordingly construct a target view frustum volume. Such a target volume representation is spatially aligned with the target view, which effectively aggregates relevant information from the input views for high-quality rendering. It also facilitates subsequent radiance field regression with a convolutional network thanks to its axis-aligned nature. The 3D context modeled by the convolutional network enables our method to synthesis sharper scene structures than prior works. Our MuRF achieves state-of-the-art performance across multiple different baseline settings and diverse scenarios ranging from simple objects (DTU) to complex indoor and outdoor scenes (RealEstate10K and LLFF). We also show promising zero-shot generalization abilities on the Mip-NeRF 360 dataset, demonstrating the general applicability of MuRF.

Abstract PDF Upgrade to Chat

Authors (8)

Citations (16)

View on Semantic Scholar

Summary

The paper introduces a target view frustum volume construction that boosts multi-view information aggregation for sparse input scenarios.
It achieves state-of-the-art rendering performance with significant improvements in metrics across datasets like DTU, RealEstate10K, and LLFF.
The CNN-based architecture removes the need for per-scene optimization, enabling robust generalization and promising zero-shot performance.

Overview of "MuRF: Multi-Baseline Radiance Fields"

The paper introduces Multi-Baseline Radiance Fields (MuRF), a novel method designed to enhance sparse view synthesis across diverse baseline settings using a feed-forward neural network approach. Traditional Neural Radiance Fields (NeRFs), while effective, struggle with sparse views due to their reliance on per-scene optimization. By addressing this limitation, MuRF aims to generalize across different scenarios efficiently, without the need for scene-specific tuning.

Key Contributions

Target View Frustum Volume Construction: Unlike previous works that construct volumes based on a pre-defined reference input view, MuRF constructs a target view frustum volume, spatially aligned with the target view plane. This alignment improves information aggregation from input views, crucially enhancing rendering quality by accommodating both small and large baseline scenarios.
State-of-the-Art Performance: MuRF demonstrates superior performance across a variety of settings. It shows marked improvements in rendering quality in datasets featuring both simple objects (e.g., DTU) and complex indoor and outdoor scenes (e.g., RealEstate10K and LLFF). Moreover, MuRF exhibits promising zero-shot generalization capabilities when tested on the Mip-NeRF 360 dataset—a notable advancement over existing methods.
Robust Across Baseline Variations: Previous state-of-the-art methods were typically optimized for either small or large baselines. However, MuRF excels in both conditions, handling varying input views and scenarios seamlessly.

Technical Approach

MuRF's architecture employs a convolutional network for radiance field regression. This choice leverages the aligned nature of the 3D volume to efficiently process the context, resulting in sharper and more precise scene structures compared to those generated by previous methods. The use of a CNN allows the model to effectively capture local structures, leading to improved rendering results in complex scenes.

In terms of methodology, MuRF involves discretizing 3D space into planes parallel to the target image plane. The approach facilitates feature aggregation by sampling multi-view image colors and features, which are then used to calculate cosine similarities for multi-view consistency cues. By integrating this information into an axis-aligned target volume, followed by reconstruction through a CNN, MuRF effectively addresses the core challenges of sparse view synthesis.

Results and Implications

The results assert MuRF's capability to outperform established methods across various benchmarks, highlighting its versatility and robust performance in both synthetic and real-world contexts. The precise improvements in metrics like PSNR and LPIPS underscore the effectiveness of MuRF's architectural innovations.

Implications

Theoretical Impact: MuRF's introduction of a target-aligned volume paradigm and context-aware decoding may influence further research towards more generalized radiance field models, encouraging consideration of alignment and context in neural rendering tasks.
Practical Impact: The ability of MuRF to generalize without per-scene optimization lays the groundwork for applications in AR, VR, and virtual content creation, where real-time and efficient rendering from sparse views is critical.

Future Directions

Future work could explore extending MuRF’s applicability beyond static scenes to dynamic environments. Additionally, integrating more sophisticated sampling strategies or leveraging datasets with higher baseline diversity could further enhance model adaptability and realism in novel view synthesis tasks.

In conclusion, "MuRF: Multi-Baseline Radiance Fields" marks a significant step forward in the field of computer vision, bridging the gap between single-baseline optimization and the flexible, generalized synthesis needed for diverse and complex real-world applications.

Markdown Report Issue