Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting (2401.01339v3)
Abstract: This paper aims to tackle the problem of modeling dynamic urban streets for autonomous driving scenes. Recent methods extend NeRF by incorporating tracked vehicle poses to animate vehicles, enabling photo-realistic view synthesis of dynamic urban street scenes. However, significant limitations are their slow training and rendering speed. We introduce Street Gaussians, a new explicit scene representation that tackles these limitations. Specifically, the dynamic urban scene is represented as a set of point clouds equipped with semantic logits and 3D Gaussians, each associated with either a foreground vehicle or the background. To model the dynamics of foreground object vehicles, each object point cloud is optimized with optimizable tracked poses, along with a 4D spherical harmonics model for the dynamic appearance. The explicit representation allows easy composition of object vehicles and background, which in turn allows for scene editing operations and rendering at 135 FPS (1066 $\times$ 1600 resolution) within half an hour of training. The proposed method is evaluated on multiple challenging benchmarks, including KITTI and Waymo Open datasets. Experiments show that the proposed method consistently outperforms state-of-the-art methods across all datasets. The code will be released to ensure reproducibility.
- Neural point-based graphics. In ECCV, 2020.
- HyperReel: High-fidelity 6-DoF video with ray-conditioned sampling. In CVPR, 2023.
- Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. ICCV, 2021.
- Mip-nerf 360: Unbounded anti-aliased neural radiance fields. CVPR, 2022.
- Zip-nerf: Anti-aliased grid-based neural radiance fields. ICCV, 2023.
- Virtual kitti 2. arXiv preprint arXiv:2001.10773, 2020.
- Geosim: Realistic video simulation via geometry-aware composition for self-driving. In CVPR, 2021.
- Neural point cloud rendering via multi-plane projection. In CVPR, 2020.
- Carla: An open urban driving simulator. In Conference on robot learning, pages 1–16. PMLR, 2017.
- Augmented lidar simulator for autonomous driving. IEEE Robotics and Automation Letters, 5(2):1931–1938, 2020.
- K-planes: Explicit radiance fields in space, time, and appearance. In CVPR, 2023.
- Panoptic nerf: 3d-to-2d label transfer for panoptic urban scene segmentation. In 3DV, pages 1–11. IEEE, 2022.
- Are we ready for autonomous driving? the kitti vision benchmark suite. In CVPR, 2012.
- Streetsurf: Extending multi-view implicit surface reconstruction to street views. arXiv preprint arXiv:2306.04988, 2023.
- Neural lidar fields for novel view synthesis. arXiv preprint arXiv:2305.01643, 2023.
- 3d gaussian splatting for real-time radiance field rendering. TOG, 42(4), 2023.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Point-based neural rendering with per-view optimization. In CGF, pages 29–43. Wiley Online Library, 2021.
- Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation. In CVPR, 2022.
- Video k-net: A simple, strong, and unified baseline for video segmentation. In CVPR, 2022a.
- Climatenerf: Physically-based neural rendering for extreme climate synthesis. arXiv preprint arXiv:2211.13226, pages arXiv–2211, 2022b.
- Neural scene flow fields for space-time view synthesis of dynamic scenes. In CVPR, 2021.
- Read: Large-scale neural scene rendering for autonomous driving. In AAAI, 2023.
- Efficient neural radiance fields for interactive free-viewpoint video. In SIGGRAPH Asia Conference Proceedings, 2022.
- High-fidelity and real-time novel view synthesis for dynamic scenes. In SIGGRAPH Asia 2023 Conference Proceedings, pages 1–9, 2023.
- Neural scene rasterization for large scene rendering in real time. In ICCV, 2023.
- Urban radiance field representation with deformable neural mesh primitives. ICCV, 2023.
- Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In 3DV, 2024.
- Lidarsim: Realistic lidar simulation by leveraging the real world. In CVPR, 2020.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Instant neural graphics primitives with a multiresolution hash encoding. SIGGRAPH, 2022.
- Giraffe: Representing scenes as compositional generative neural feature fields. In CVPR, 2021.
- Neural scene graphs for dynamic scenes. In CVPR, 2021.
- Neural point light fields. CVPR, 2022.
- Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. arXiv preprint arXiv:2106.13228, 2021.
- Representing volumetric videos as dynamic mlp maps. In CVPR, pages 4252–4262, 2023.
- Urban radiance fields. In CVPR, 2022.
- Adop: Approximate differentiable one-pixel point rendering. TOG, 41(4):1–14, 2022.
- Structure-from-motion revisited. In CVPR, 2016.
- Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In Field and Service Robotics: Results of the 11th International Conference, pages 621–635. Springer, 2018.
- Gina-3d: Learning to generate implicit neural assets in the wild. In CVPR, 2023.
- Nerfplayer: A streamable dynamic scene representation with decomposed neural radiance fields. TVCG, 29(5):2732–2742, 2023.
- Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In CVPR, 2022.
- Scalability in perception for autonomous driving: Waymo open dataset. In CVPR, 2020.
- Block-nerf: Scalable large scene neural view synthesis. In CVPR, 2022.
- Nerfstudio: A modular framework for neural radiance field development. In ACM SIGGRAPH 2023 Conference Proceedings, 2023.
- Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In CVPR, 2022.
- Suds: Scalable urban dynamic scenes. In CVPR, 2023.
- Cadsim: Robust and scalable in-the-wild 3d reconstruction for controllable sensor simulation. arXiv preprint arXiv:2311.01447, 2023a.
- Neural fields meet explicit geometric representations for inverse rendering of urban scenes. In CVPR, 2023b.
- HumanNeRF: Free-viewpoint rendering of moving people from monocular video. In CVPR, 2022.
- 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528, 2023a.
- 3d multi-object tracking in point clouds based on prediction confidence-guided data association. IEEE Transactions on Intelligent Transportation Systems, 23(6):5668–5677, 2021.
- Casa: A cascade attention network for 3d object detection from lidar point clouds. IEEE Transactions on Geoscience and Remote Sensing, 2022.
- Mars: An instance-aware, modular and realistic simulator for autonomous driving. CICAI, 2023b.
- S-nerf: Neural radiance fields for street views. In ICLR, 2023.
- Discoscene: Spatially disentangled generative radiance fields for controllable 3d-aware scene synthesis. In CVPR, 2023a.
- 4k4d: Real-time 4d view synthesis at 4k resolution. arXiv preprint arXiv:2310.11448, 2023b.
- Emernerf: Emergent spatial-temporal scene decomposition via self-supervision. arXiv preprint arXiv:2311.02077, 2023a.
- Urbangiraffe: Representing urban scenes as compositional generative neural feature fields. arXiv preprint arXiv:2303.14167, 2023b.
- Surfelgan: Synthesizing realistic sensor data for autonomous driving. In CVPR, pages 11118–11127, 2020.
- Unisim: A neural closed-loop sensor simulator. In CVPR, 2023c.
- Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101, 2023d.
- Reconstructing objects in-the-wild for realistic sensor simulation. ICRA, 2023e.
- Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. arXiv preprint arXiv 2310.10642, 2023f.
- Differentiable surface splatting for point-based geometry processing. TOG, 38(6), 2019.
- Differentiable point-based radiance fields for efficient view synthesis. In SIGGRAPH Asia 2022 Conference Papers, pages 1–12, 2022.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
- Nerflets: Local radiance fields for efficient structure-aware 3d scene representation from 2d supervision. In CVPR, 2023.