XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold (2403.19517v1)

Published 28 Mar 2024 in cs.CV

Abstract: We propose XScale-NVS for high-fidelity cross-scale novel view synthesis of real-world large-scale scenes. Existing representations based on explicit surface suffer from discretization resolution or UV distortion, while implicit volumetric representations lack scalability for large scenes due to the dispersed weight distribution and surface ambiguity. In light of the above challenges, we introduce hash featurized manifold, a novel hash-based featurization coupled with a deferred neural rendering framework. This approach fully unlocks the expressivity of the representation by explicitly concentrating the hash entries on the 2D manifold, thus effectively representing highly detailed contents independent of the discretization resolution. We also introduce a novel dataset, namely GigaNVS, to benchmark cross-scale, high-resolution novel view synthesis of realworld large-scale scenes. Our method significantly outperforms competing baselines on various real-world scenes, yielding an average LPIPS that is 40% lower than prior state-of-the-art on the challenging GigaNVS benchmark. Please see our project page at: xscalenvs.github.io.

References (53)

Authors (5)

Guangyu Wang (25 papers)
Jinzhi Zhang (9 papers)
Fan Wang (313 papers)
Ruqi Huang (21 papers)
Lu Fang (44 papers)

Summary

XScale-NVS: Embracing Cross-Scale Novel View Synthesis through Hash Featurized Manifolds

Introduction

Recent advancements in neural rendering have laid the foundation for a multitude of applications ranging from virtual reality to robotic simulations. Despite these progresses, the quest for high-fidelity cross-scale Novel View Synthesis (NVS) of large-scale real-world scenes continues to be a significant challenge. Traditional representations suffer from inherent limitations; explicit surface-based representations grapple with discretization resolution issues or surface parametrization distortions, while implicit volumetric representations fall short on scalability due to their dispersed weight distribution and surface ambiguity.

Addressing these challenges, this work introduces the XScale-NVS framework, underpinned by a novel scene representation called hash featurized manifold. This representation, coupled with a deferred neural rendering framework, aims at generating detailed and scalable reconstructions of large-scale scenes beyond the limitations of existing methods. Alongside, a novel dataset, GigaNVS, is presented to benchmark cross-scale, high-resolution NVS in real-world large-scale scenes, pushing the boundaries of current neural rendering capabilities.

Hash Featurized Manifold

The proposed hash featurized manifold representation steers clear of the resolution dependencies and distortion issues plaguing existing scene representations. By concentrating the hash entries on the 2D manifold, it effectively captures highly detailed content independent of the discretization resolution.

Surface Multisampling Enhancement: This enhancement addresses the challenge of unstructured scale variations common in large-scale scenes. By casting multiple rays per pixel, it encapsulates a broader representation of the scene's surface, mitigating aliasing effects and improving detail capture across varying view distances.
Manifold Deformation Mechanism: Aimed at bolstering the multi-view consistency, this mechanism improves the representation's tolerance to geometric imperfections. It utilizes a deformation in the high-dimensional feature space, thus allowing a more accurate and flexible detailing of intricate scene features.

GigaNVS Dataset

Recognizing the limitations of existing real-world NVS benchmarks, the GigaNVS dataset is specifically designed to address these gaps. It covers an average area of 1.4 million square meters per scene, featuring a combination of aerial and ground photography to capture the intricate detail of large-scale scenes at an unprecedented texture resolution. This dataset facilitates a comprehensive evaluation of NVS algorithms, underscoring the need for models that can balance detail retention across variable scene scales.

Performance and Implications

The proposed XScale-NVS model demonstrates significant improvements over existing approaches on the GigaNVS benchmark. Notably, it achieves an average LPIPS metric approximately 40\% lower than the previous state-of-the-art models, representing a substantial leap in rendering fidelity for large-scale scenes.

Theoretical Implications: The introduction of hash featurized manifolds presents a new paradigm in scene representation, emphasizing the importance of prioritizing multi-view consistent regions for optimizing neural rendering quality.
Practical Applications: The framework's superior performance in rendering detailed, high-resolution views from novel perspectives holds immense potential for applications in virtual tourism, cinematic content creation, and simulation-based training environments.

Future Directions

This work opens several avenues for future research, particularly in enhancing the framework's robustness to incomplete or inaccurate geometry. By integrating differentiable rendering techniques, there is potential to allow for more dynamic control over scene geometry during the neural rendering process, further pushing the capabilities of NVS technologies.

Conclusion

XScale-NVS represents a significant step forward in the pursuit of high-fidelity, cross-scale NVS for real-world large-scale scenes. By innovating in scene representation and introducing a comprehensive benchmark dataset, this work paves the way for future advancements in neural rendering technologies and their applications across diverse domains.

PDF Markdown

Tweets

https://twitter.com/zhenjun_zhao/status/1774326071149891625