Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold (2403.19517v1)

Published 28 Mar 2024 in cs.CV

Abstract: We propose XScale-NVS for high-fidelity cross-scale novel view synthesis of real-world large-scale scenes. Existing representations based on explicit surface suffer from discretization resolution or UV distortion, while implicit volumetric representations lack scalability for large scenes due to the dispersed weight distribution and surface ambiguity. In light of the above challenges, we introduce hash featurized manifold, a novel hash-based featurization coupled with a deferred neural rendering framework. This approach fully unlocks the expressivity of the representation by explicitly concentrating the hash entries on the 2D manifold, thus effectively representing highly detailed contents independent of the discretization resolution. We also introduce a novel dataset, namely GigaNVS, to benchmark cross-scale, high-resolution novel view synthesis of realworld large-scale scenes. Our method significantly outperforms competing baselines on various real-world scenes, yielding an average LPIPS that is 40% lower than prior state-of-the-art on the challenging GigaNVS benchmark. Please see our project page at: xscalenvs.github.io.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Agisoft LLC. Agisoft metashape, 2021.
  2. Neural point-based graphics. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pages 696–712. Springer, 2020.
  3. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
  4. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470–5479, 2022.
  5. Zip-nerf: Anti-aliased grid-based neural radiance fields. arXiv preprint arXiv:2304.06706, 2023.
  6. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017.
  7. Depth-supervised nerf: Fewer views and faster training for free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12882–12891, 2022.
  8. Surface parameterization: a tutorial and survey. Advances in multiresolution for geometric modelling, pages 157–186, 2005.
  9. Geo-neus: Geometry-consistent neural implicit surfaces learning for multi-view reconstruction. Advances in Neural Information Processing Systems, 35:3403–3416, 2022.
  10. Multi-view stereo: A tutorial. Foundations and Trends® in Computer Graphics and Vision, 9(1-2):1–148, 2015.
  11. Surface simplification using quadric error metrics. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques, pages 209–216, 1997.
  12. Creating raster omnimax images from multiple perspective views using the elliptical weighted average filter. IEEE Computer Graphics and Applications, 6(6):21–27, 1986.
  13. UE4-neRF:neural radiance field for real-time rendering of large-scale scene. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  14. Mesh parameterization: theory and practice. ACM SIGGRAPH ASIA 2008 courses, 2007.
  15. Surfacenet: An end-to-end 3d neural network for multiview stereopsis. In Proceedings of the IEEE international conference on computer vision, pages 2307–2315, 2017.
  16. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (ToG), 42(4):1–14, 2023.
  17. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG), 36(4):1–13, 2017.
  18. Neural point catacaustics for novel-view synthesis of reflections. ACM Transactions on Graphics (TOG), 41(6):1–15, 2022.
  19. Matrixcity: A large-scale city dataset for city-scale neural rendering and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3205–3215, 2023a.
  20. Read: Large-scale neural scene rendering for autonomous driving. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1522–1529, 2023b.
  21. Neuralangelo: High-fidelity neural surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8456–8465, 2023c.
  22. Capturing, reconstructing, and simulating: the urbanscene3d dataset. In ECCV, pages 93–109, 2022.
  23. Real-time neural rasterization for large scenes. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8416–8427, 2023.
  24. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  25. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  26. Infinite photorealistic worlds using procedural generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12630–12641, 2023.
  27. Npbg++: Accelerating neural point-based graphics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15969–15979, 2022.
  28. Periodic global parameterization. ACM Trans. Graph., 25:1460–1485, 2006.
  29. Urban radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12932–12942, 2022.
  30. Permutosdf: Fast multi-view reconstruction with implicit surfaces using permutohedral lattices. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8466–8475, 2023.
  31. Adop: Approximate differentiable one-pixel point rendering. ACM Transactions on Graphics (ToG), 41(4):1–14, 2022.
  32. Pixelwise view selection for unstructured multi-view stereo. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14, pages 501–518. Springer, 2016.
  33. A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3260–3269, 2017.
  34. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5459–5469, 2022.
  35. Block-nerf: Scalable large scene neural view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8248–8258, 2022.
  36. Deferred neural rendering: Image synthesis using neural textures. Acm Transactions on Graphics (TOG), 38(4):1–12, 2019.
  37. Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12922–12931, 2022.
  38. Nerf-sr: High quality neural radiance fields using supersampling. In Proceedings of the 30th ACM International Conference on Multimedia, pages 6445–6454, 2022.
  39. Giganticnvs: Gigapixel large-scale neural rendering with implicit meta-deformed manifold. IEEE Transactions on Pattern Analysis and Machine Intelligence, (01):1–15, 2023a.
  40. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021.
  41. Panda: A gigapixel-level human-centric video dataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3268–3278, 2020.
  42. Neus2: Fast learning of neural implicit surfaces for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3295–3306, 2023b.
  43. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
  44. Bungeenerf: Progressive neural radiance field for extreme multi-scale scene rendering. In European conference on computer vision, pages 106–122. Springer, 2022.
  45. Neumesh: Learning disentangled neural mesh-based implicit field for geometry and texture editing. In European Conference on Computer Vision, pages 597–614. Springer, 2022.
  46. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European conference on computer vision (ECCV), pages 767–783, 2018.
  47. Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1790–1799, 2020.
  48. Volume rendering of neural implicit surfaces. Advances in Neural Information Processing Systems, 34:4805–4815, 2021.
  49. A modular hierarchical array camera. Light: Science & Applications, 10(1):1–9, 2021.
  50. Multiscale-vr: multiscale gigapixel 3d panoramic videography for virtual reality. In 2020 IEEE international conference on computational photography (ICCP), pages 1–12. IEEE, 2020.
  51. Gigamvs: a benchmark for ultra-large-scale gigapixel-level 3d reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7534–7550, 2021.
  52. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  53. View synthesis with sculpted neural points. arXiv preprint arXiv:2205.05869, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Guangyu Wang (25 papers)
  2. Jinzhi Zhang (9 papers)
  3. Fan Wang (313 papers)
  4. Ruqi Huang (21 papers)
  5. Lu Fang (44 papers)

Summary

XScale-NVS: Embracing Cross-Scale Novel View Synthesis through Hash Featurized Manifolds

Introduction

Recent advancements in neural rendering have laid the foundation for a multitude of applications ranging from virtual reality to robotic simulations. Despite these progresses, the quest for high-fidelity cross-scale Novel View Synthesis (NVS) of large-scale real-world scenes continues to be a significant challenge. Traditional representations suffer from inherent limitations; explicit surface-based representations grapple with discretization resolution issues or surface parametrization distortions, while implicit volumetric representations fall short on scalability due to their dispersed weight distribution and surface ambiguity.

Addressing these challenges, this work introduces the XScale-NVS framework, underpinned by a novel scene representation called hash featurized manifold. This representation, coupled with a deferred neural rendering framework, aims at generating detailed and scalable reconstructions of large-scale scenes beyond the limitations of existing methods. Alongside, a novel dataset, GigaNVS, is presented to benchmark cross-scale, high-resolution NVS in real-world large-scale scenes, pushing the boundaries of current neural rendering capabilities.

Hash Featurized Manifold

The proposed hash featurized manifold representation steers clear of the resolution dependencies and distortion issues plaguing existing scene representations. By concentrating the hash entries on the 2D manifold, it effectively captures highly detailed content independent of the discretization resolution.

  • Surface Multisampling Enhancement: This enhancement addresses the challenge of unstructured scale variations common in large-scale scenes. By casting multiple rays per pixel, it encapsulates a broader representation of the scene's surface, mitigating aliasing effects and improving detail capture across varying view distances.
  • Manifold Deformation Mechanism: Aimed at bolstering the multi-view consistency, this mechanism improves the representation's tolerance to geometric imperfections. It utilizes a deformation in the high-dimensional feature space, thus allowing a more accurate and flexible detailing of intricate scene features.

GigaNVS Dataset

Recognizing the limitations of existing real-world NVS benchmarks, the GigaNVS dataset is specifically designed to address these gaps. It covers an average area of 1.4 million square meters per scene, featuring a combination of aerial and ground photography to capture the intricate detail of large-scale scenes at an unprecedented texture resolution. This dataset facilitates a comprehensive evaluation of NVS algorithms, underscoring the need for models that can balance detail retention across variable scene scales.

Performance and Implications

The proposed XScale-NVS model demonstrates significant improvements over existing approaches on the GigaNVS benchmark. Notably, it achieves an average LPIPS metric approximately 40\% lower than the previous state-of-the-art models, representing a substantial leap in rendering fidelity for large-scale scenes.

  • Theoretical Implications: The introduction of hash featurized manifolds presents a new paradigm in scene representation, emphasizing the importance of prioritizing multi-view consistent regions for optimizing neural rendering quality.
  • Practical Applications: The framework's superior performance in rendering detailed, high-resolution views from novel perspectives holds immense potential for applications in virtual tourism, cinematic content creation, and simulation-based training environments.

Future Directions

This work opens several avenues for future research, particularly in enhancing the framework's robustness to incomplete or inaccurate geometry. By integrating differentiable rendering techniques, there is potential to allow for more dynamic control over scene geometry during the neural rendering process, further pushing the capabilities of NVS technologies.

Conclusion

XScale-NVS represents a significant step forward in the pursuit of high-fidelity, cross-scale NVS for real-world large-scale scenes. By innovating in scene representation and introducing a comprehensive benchmark dataset, this work paves the way for future advancements in neural rendering technologies and their applications across diverse domains.

X Twitter Logo Streamline Icon: https://streamlinehq.com