Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Level Neural Scene Graphs for Dynamic Urban Environments (2404.00168v1)

Published 29 Mar 2024 in cs.CV

Abstract: We estimate the radiance field of large-scale dynamic areas from multiple vehicle captures under varying environmental conditions. Previous works in this domain are either restricted to static environments, do not scale to more than a single short video, or struggle to separately represent dynamic object instances. To this end, we present a novel, decomposable radiance field approach for dynamic urban environments. We propose a multi-level neural scene graph representation that scales to thousands of images from dozens of sequences with hundreds of fast-moving objects. To enable efficient training and rendering of our representation, we develop a fast composite ray sampling and rendering scheme. To test our approach in urban driving scenarios, we introduce a new, novel view synthesis benchmark. We show that our approach outperforms prior art by a significant margin on both established and our proposed benchmark while being faster in training and rendering.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. 3d scene graph: A structure for unified semantics, 3d space, and camera. In ICCV, 2019.
  2. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In CVPR, 2022.
  3. Introduction to implicit surfaces. 1997.
  4. Virtual kitti 2, 2020.
  5. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Transactions on robotics, 32(6):1309–1332, 2016.
  6. nuscenes: A multimodal dataset for autonomous driving. In CVPR, 2020.
  7. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
  8. Lessons from scene graphs: using scene graphs to teach hierarchical modeling. Computers & Graphics, 25(4):703–711, 2001.
  9. Real‐time slam with octree evidence grids for exploration in underwater tunnels. Journal of Field Robotics, 24, 2007.
  10. Fast dynamic radiance fields with time-aware neural voxels. In SIGGRAPH Asia 2022 Conference Papers, 2022.
  11. Plenoxels: Radiance fields without neural networks. In CVPR, 2022.
  12. Dynamic view synthesis from dynamic monocular video. In CVPR, 2021.
  13. Monocular dynamic view synthesis: A reality check. NeurIPS, 2022.
  14. Surfelnerf: Neural surfel radiance fields for online photorealistic reconstruction of indoor scenes. In CVPR, 2023.
  15. Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, 2012.
  16. Michael Kaess. Simultaneous localization and mapping with infinite planes. 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 4605–4611, 2015.
  17. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (ToG), 42(4):1–14, 2023.
  18. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  19. Panoptic neural fields: A semantic object-aware neural scene representation. In CVPR, 2022.
  20. Neural scene flow fields for space-time view synthesis of dynamic scenes. In CVPR, 2021.
  21. Dynibar: Neural dynamic image-based rendering. In CVPR, 2023.
  22. Barf: Bundle-adjusting neural radiance fields. In ICCV, 2021.
  23. Real-time neural rasterization for large scenes. In ICCV, 2023.
  24. Neural sparse voxel fields. NeurIPS, 2020.
  25. Urban radiance field representation with deformable neural mesh primitives. In ICCV, 2023.
  26. Visual navigation using heterogeneous landmarks and unsupervised geometric constraints. IEEE Transactions on Robotics, 31:736–749, 2015.
  27. Track to reconstruct and reconstruct to track. IEEE Robotics and Automation Letters, 5(2):1803–1810, 2020.
  28. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. arXiv preprint arXiv:2308.09713, 2023.
  29. A ray-box intersection algorithm and efficient dynamic voxel rendering. Journal of Computer Graphics Techniques Vol, 7(3):66–81, 2018.
  30. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In CVPR, 2021.
  31. Occupancy networks: Learning 3d reconstruction in function space. In CVPR, 2019.
  32. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  33. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  34. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In CVPR, 2020.
  35. OpenStreetMap contributors. Planet dump retrieved from https://planet.osm.org . https://www.openstreetmap.org, 2017.
  36. Neural scene graphs for dynamic scenes. In CVPR, 2021.
  37. Simpletrack: Understanding and rethinking 3d multi-object tracking. arXiv preprint arXiv:2111.09621, 2021.
  38. Point-dynrf: Point-based dynamic radiance fields from a monocular video, 2023.
  39. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. arXiv preprint arXiv:2106.13228, 2021.
  40. Temporal interpolation is all you need for dynamic neural radiance fields. In CVPR, 2023.
  41. Pytorch: An imperative style, high-performance deep learning library. NeurIPS, 2019.
  42. Detailed real-time urban 3d reconstruction from video. IJCV, 78:143–167, 2007.
  43. D-nerf: Neural radiance fields for dynamic scenes. In CVPR, 2021.
  44. Urban radiance fields. In CVPR, 2022.
  45. Stable view synthesis. In CVPR, 2021.
  46. 3d dynamic scene graphs: Actionable spatial perception with places, objects, and humans. arXiv preprint arXiv:2002.06289, 2020.
  47. Nerf for outdoor scene relighting. In ECCV, 2022.
  48. Slam++: Simultaneous localisation and mapping at the level of objects. In CVPR, 2013.
  49. R3d3: Dense 3d reconstruction of dynamic scenes from multiple cameras. In ICCV, 2023.
  50. Interpolating and approximating implicit surfaces from polygon soup. In ACM SIGGRAPH 2004 Papers, page 896–904, 2004.
  51. Scene representation networks: Continuous 3d-structure-aware neural scene representations. NeurIPS, 2019.
  52. Nerfplayer: A streamable dynamic scene representation with decomposed neural radiance fields. IEEE Transactions on Visualization and Computer Graphics, 29(5):2732–2742, 2023.
  53. Henry Sowizral. Scene graphs in the new millennium. IEEE Computer Graphics and Applications, 20(1):56–57, 2000.
  54. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In CVPR, 2022.
  55. Block-nerf: Scalable large scene neural view synthesis. In CVPR, 2022.
  56. Nerfstudio: A modular framework for neural radiance field development. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–12, 2023.
  57. Raft: Recurrent all-pairs field transforms for optical flow. In ECCV, 2020.
  58. Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In ICCV, 2021.
  59. Factoring shape, pose, and layout from the 2d image of a 3d scene. In CVPR, 2018.
  60. Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In CVPR, 2022.
  61. Suds: Scalable urban dynamic scenes. In CVPR, 2023.
  62. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
  63. Neural fields meet explicit geometric representations for inverse rendering of urban scenes. In CVPR, 2023.
  64. Argoverse 2: Next generation datasets for self-driving perception and forecasting. arXiv preprint arXiv:2301.00493, 2023.
  65. D^ 2nerf: Self-supervised decoupling of dynamic and static objects from a monocular video. NeurIPS, 2022.
  66. Space-time neural irradiance fields for free-viewpoint video. In CVPR, 2021.
  67. S-nerf: Neural radiance fields for street views. arXiv preprint arXiv:2303.00749, 2023.
  68. Point-nerf: Point-based neural radiance fields. In CVPR, 2022.
  69. Banmo: Building animatable 3d neural models from many casual videos. In CVPR, 2022.
  70. Unisim: A neural closed-loop sensor simulator. In CVPR, 2023.
  71. Center-based 3d object detection and tracking. In CVPR, 2021.
  72. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
Citations (5)

Summary

  • The paper introduces a decomposable radiance field approach using multi-level neural scene graphs to effectively represent dynamic urban scenes.
  • It implements a fast composite ray sampling strategy that significantly accelerates training and rendering compared to conventional methods.
  • Evaluations on a novel urban benchmark demonstrate enhanced view synthesis performance with improved PSNR, SSIM, and LPIPS metrics.

Multi-Level Neural Scene Graphs for Dynamic Urban Environments

Introduction

Dynamic urban environments pose a complex challenge for radiance field estimation due to their inherent variability and the presence of multiple, fast-moving objects. Traditional methods either focus on static scenes or deal inadequately with dynamic entities, limiting their applicability for realistic novel view synthesis in city-scale scenarios. Addressing these limitations, "Multi-Level Neural Scene Graphs for Dynamic Urban Environments" introduces a novel decomposable radiance field approach designed to scale efficiently to large geographic areas replete with dynamic entities under varying conditions.

Contributions

The paper's primary contributions are threefold:

  • The introduction of a multi-level neural scene graph representation that is capable of handling large datasets comprising thousands of images across multiple sequences, with hundreds of fast-moving objects.
  • The development of a fast composite ray sampling and rendering scheme, specifically designed to facilitate efficient training and rendering for the proposed representation.
  • The creation of a novel view synthesis benchmark tailored for urban driving scenarios, enabling realistic and application-driven evaluations of radiance field reconstruction in dynamic environments.

Methodology

Scene Graph Representation

The foundation of the approach is a multi-level scene graph that organizes the environment into dynamic objects, sequences, and camera nodes, all connected within a hierarchical structure allowing for detailed localization and identification of each entity in the 3D space. The global frame at the root node unifies these elements, facilitating coherent scene representation. This graph is key to distinguishing between static and dynamic components effectively and overcoming the limitations of earlier works.

Efficiency through Ray Sampling and Rendering

Given the large-scale nature of urban environments, rendering efficiency is paramount. The paper addresses this by implementing a composite ray sampling strategy that drastically accelerates both the training and rendering processes, as opposed to traditional methods that suffer from inefficiencies due to either sparse sampling or separate node processing.

Benchmark for Urban Environments

To facilitate effective evaluation, the paper introduces a comprehensive benchmark based on the Argoverse 2 dataset, incorporating various environmental conditions across two urban areas. This benchmark is pivotal in stressing the method's robustness and scalability while providing a clear path for comparative analysis against existing methods.

Evaluation and Results

Through extensive testing, the proposed approach significantly outperformed existing methods on both the established and newly proposed benchmarks, particularly in terms of view synthesis quality under dynamic conditions. Results showed notable improvements in PSNR, SSIM, and LPIPS metrics, evidencing both the method’s precision and its practical efficiency during training and rendering phases.

Implications and Future Directions

The research presents a significant step forward in the radiance field reconstruction of dynamic urban environments, indicating promising applications in autonomous driving, city-scale mapping, and mixed reality scenarios. By intricately representing dynamic objects within a scalable framework and demonstrating efficiency in large datasets, this work paves the way for future developments in the field.

Speculatively, the introduction of a multi-level scene graph might herald a new focus on higher-level decompositions in scene understanding, potentially leading to even more sophisticated methods for dealing with the intricacies of dynamic environments. Furthermore, the benchmark established herein offers a robust foundation for future research, highlighting the importance of real-world applicability in the development of novel view synthesis methods.

Closing Remarks

Overcoming the challenge of reconstructing radiance fields in dynamic urban environments calls for innovative approaches that can effectively handle large-scale, complex data. This paper's contributions, notably the multi-level neural scene graph, represent a significant advancement towards this goal, offering enhanced realism and efficiency. The accompanying benchmark further strengthens the methodology's value, providing a comprehensive tool for ongoing and future research in the field.