ProSGNeRF: Progressive Dynamic Neural Scene Graph with Frequency Modulated Auto-Encoder in Urban Scenes (2312.09076v2)
Abstract: Implicit neural representation has demonstrated promising results in view synthesis for large and complex scenes. However, existing approaches either fail to capture the fast-moving objects or need to build the scene graph without camera ego-motions, leading to low-quality synthesized views of the scene. We aim to jointly solve the view synthesis problem of large-scale urban scenes and fast-moving vehicles, which is more practical and challenging. To this end, we first leverage a graph structure to learn the local scene representations of dynamic objects and the background. Then, we design a progressive scheme that dynamically allocates a new local scene graph trained with frames within a temporal window, allowing us to scale up the representation to an arbitrarily large scene. Besides, the training views of urban scenes are relatively sparse, which leads to a significant decline in reconstruction accuracy for dynamic objects. Therefore, we design a frequency auto-encoder network to encode the latent code and regularize the frequency range of objects, which can enhance the representation of dynamic objects and address the issue of sparse image inputs. Additionally, we employ lidar point projection to maintain geometry consistency in large-scale urban scenes. Experimental results demonstrate that our method achieves state-of-the-art view synthesis accuracy, object manipulation, and scene roaming ability. The code will be open-sourced upon paper acceptance.
- Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
- Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470–5479, 2022.
- Virtual kitti 2. arXiv preprint arXiv:2001.10773, 2020.
- Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
- Depth-supervised nerf: Fewer views and faster training for free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12882–12891, June 2022.
- Panoptic nerf: 3d-to-2d label transfer for panoptic urban scene segmentation. In 2022 International Conference on 3D Vision (3DV), pages 1–11. IEEE, 2022.
- Dynamic view synthesis from dynamic monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5712–5721, 2021.
- Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11):1231–1237, 2013.
- Panoptic neural fields: A semantic object-aware neural scene representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12871–12881, 2022.
- Neural 3d video synthesis from multi-view video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5521–5531, 2022.
- Neural scene flow fields for space-time view synthesis of dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6498–6508, 2021.
- A ray-box intersection algorithm and efficient dynamic voxel rendering. Journal of Computer Graphics Techniques Vol, 7(3):66–81, 2018.
- Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7210–7219, 2021.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision, 2020.
- Autorf: Learning 3d object radiance fields from single view observations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3971–3980, 2022.
- Neural scene graphs for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2856–2865, 2021.
- Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5865–5874, 2021.
- Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. arXiv preprint arXiv:2106.13228, 2021.
- Block-nerf: Scalable large scene neural view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8248–8258, June 2022.
- Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12922–12931, June 2022.
- Suds: Scalable urban dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12375–12385, 2023.
- Mars: An instance-aware, modular and realistic simulator for autonomous driving. arXiv preprint arXiv:2307.15058, 2023.
- Space-time neural irradiance fields for free-viewpoint video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9421–9431, 2021.
- Bungeenerf: Progressive neural radiance field for extreme multi-scale scene rendering. In European conference on computer vision, pages 106–122. Springer, 2022.
- Freenerf: Improving few-shot neural rendering with free frequency regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8254–8263, 2023.
- Nerf++: Analyzing and improving neural radiance fields. arXiv preprint arXiv:2010.07492, 2020.