3D StreetUnveiler with Semantic-Aware 2DGS (2405.18416v2)
Abstract: Unveiling an empty street from crowded observations captured by in-car cameras is crucial for autonomous driving. However, removing all temporarily static objects, such as stopped vehicles and standing pedestrians, presents a significant challenge. Unlike object-centric 3D inpainting, which relies on thorough observation in a small scene, street scene cases involve long trajectories that differ from previous 3D inpainting tasks. The camera-centric moving environment of captured videos further complicates the task due to the limited degree and time duration of object observation. To address these obstacles, we introduce StreetUnveiler to reconstruct an empty street. StreetUnveiler learns a 3D representation of the empty street from crowded observations. Our representation is based on the hard-label semantic 2D Gaussian Splatting (2DGS) for its scalability and ability to identify Gaussians to be removed. We inpaint rendered image after removing unwanted Gaussians to provide pseudo-labels and subsequently re-optimize the 2DGS. Given its temporal continuous movement, we divide the empty street scene into observed, partial-observed, and unobserved regions, which we propose to locate through a rendered alpha map. This decomposition helps us to minimize the regions that need to be inpainted. To enhance the temporal consistency of the inpainting, we introduce a novel time-reversal framework to inpaint frames in reverse order and use later frames as references for earlier frames to fully utilize the long-trajectory observations. Our experiments conducted on the street scene dataset successfully reconstructed a 3D representation of the empty street. The mesh representation of the empty street can be extracted for further applications. The project page and more visualizations can be found at: https://streetunveiler.github.io
- Google street view: Capturing the world at street level. Computer, 43, 2010.
- Sine: Semantic-driven image-based nerf editing with prior-guided editing field. In CVPR, pages 20919–20929, 2023.
- Image inpainting. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 417–424, 2000.
- Leftrefill: Filling right canvas based on left reference through generalized text-to-image diffusion model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.
- Zits++: Image inpainting by improving the incremental transformer on structural priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Tensorf: Tensorial radiance fields. In European Conference on Computer Vision (ECCV), 2022.
- Gaussianeditor: Swift and controllable 3d editing with gaussian splatting, 2023.
- Periodic vibration gaussian: Dynamic urban scene reconstruction and real-time rendering. arXiv:2311.18561, 2023.
- Gaussianpro: 3d gaussian splatting with progressive propagation. arXiv preprint arXiv:2402.14650, 2024.
- Neumesh: Learning disentangled neural mesh-based implicit field for geometry and texture editing. In European Conference on Computer Vision (ECCV), 2022.
- Gaussianeditor: Editing 3d gaussians delicately with text instructions, 2023.
- Panoptic nerf: 3d-to-2d label transfer for panoptic urban scene segmentation. In Proceedings of the International Conference on 3D Vision (3DV), 2022.
- Neural 3d scene reconstruction with the manhattan-world assumption. In CVPR, 2022.
- Streetsurf: Extending multi-view implicit surface reconstruction to street views. arXiv preprint arXiv:2306.04988, 2023.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6629–6640, Red Hook, NY, USA, 2017. Curran Associates Inc.
- Proposal-based video completion. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16, pages 38–54. Springer, 2020.
- 2d gaussian splatting for geometrically accurate radiance fields. In SIGGRAPH 2024 Conference Papers. Association for Computing Machinery, 2024.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), July 2023.
- Lerf: Language embedded radiance fields. In International Conference on Computer Vision (ICCV), 2023.
- Deep video inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5792–5801, 2019.
- Auto-encoding variational bayes, 2022.
- Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4015–4026, October 2023.
- Decomposing nerf for editing via feature field distillation. In Advances in Neural Information Processing Systems, volume 35, 2022.
- Panoptic neural fields: A semantic object-aware neural scene representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Pulsar: Efficient sphere-based neural rendering. In CVPR, 2021.
- Towards an end-to-end framework for flow-guided video inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17562–17571, 2022.
- Taming latent diffusion model for neural radiance field inpainting. 2024.
- Vastgaussian: Vast 3d gaussians for large scene reconstruction. In CVPR, 2024.
- Partial convolution based padding. Arxiv, 2018.
- Coherent semantic attention for image inpainting. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4170–4179, 2019.
- Reduce information loss in transformers for pluralistic image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11347–11357, 2022.
- Editing conditional radiance fields, 2021.
- Infusion: Inpainting 3d gaussians via learning depth completion from diffusion prior. arXiv preprint arXiv:2404.11613, 2024.
- Urban radiance field representation with deformable neural mesh primitives. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- Progressively optimized local radiance fields for robust view synthesis. In Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, pages 16539–16548, 2023.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision (ECCV), 2020.
- SPIn-NeRF: Multiview segmentation and perceptual inpainting with neural radiance fields. In CVPR, 2023.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, July 2022.
- Context encoders: Feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2536–2544, 2016.
- Openscene: 3d scene understanding with open vocabularies. 2023.
- Plane-based multi-view inpainting for image-based rendering in large scenes. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D), 2018.
- Sdxl: Improving latent diffusion models for high-resolution image synthesis, 2023.
- Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. In International Conference on Computer Vision (ICCV), 2021.
- Urban radiance fields. CVPR, 2022.
- Dlformer: Discrete latent transformer for video inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3511–3520, 2022.
- Octree-gs: Towards consistent real-time rendering with lod-structured 3d gaussians, 2024.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Plenoxels: Radiance fields without neural networks. In CVPR, 2022.
- Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV), 2016.
- Panoptic lifting for 3d scene understanding with neural fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9043–9052, June 2023.
- Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. CVPR, 2022.
- Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Resolution-robust large mask inpainting with fourier convolutions. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3172–3182, 2021.
- Block-NeRF: Scalable large scene neural view synthesis. arXiv, 2022.
- Multi-view inpainting for image-based scene editing and rendering. In Proceedings of the International Conference on 3D Vision (3DV), 2016.
- Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In CVPR, pages 12922–12931, June 2022.
- Suds: Scalable urban dynamic scenes. In Computer Vision and Pattern Recognition (CVPR), 2023.
- High-fidelity pluralistic image completion with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4692–4701, 2021.
- Video inpainting by jointly learning temporal structure and spatial details. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 5232–5239, 2019.
- Inpaintnerf360: Text-guided 3d inpainting on unbounded neural radiance fields. arXiv, 2023.
- F2-nerf: Fast neural radiance field training with free camera trajectories. CVPR, 2023.
- Repopulating street scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Towards context-stable and visual-consistent image inpainting, 2024.
- Gscream: Learning 3d geometry and feature consistent gaussian splatting for object removal. arXiv preprint arXiv:2404.13679, 2024.
- Neural fields meet explicit geometric representations for inverse rendering of urban scenes. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2023.
- Nerfiller: Completing scenes via generative 3d inpainting. In CVPR, 2024.
- Removing objects from neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Rendering humans from object-occluded monocular videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023.
- Segformer: Simple and efficient design for semantic segmentation with transformers. In Neural Information Processing Systems (NeurIPS), 2021.
- Grid-guided neural radiance fields for large urban scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5438–5448, 2022.
- Deep flow-guided video inpainting. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
- Street gaussians for modeling dynamic urban scenes. 2023.
- Dreamspace: Dreaming your room space with text-driven panoramic texture propagation. 2023.
- Unisim: A neural closed-loop sensor simulator. In CVPR, 2023.
- Gaussian grouping: Segment and edit anything in 3d scenes, 2023.
- Contextual residual aggregation for ultra high-resolution image inpainting. In IEEE Conf. Comput. Vis. Pattern Recog. (CVPR), pages 7508–7517, 2020.
- Generative image inpainting with contextual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5505–5514, 2018.
- Free-form image inpainting with gated convolution. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 4471–4480, 2019.
- Nerf-editing: Geometry editing of neural radiance fields. In CVPR, 2022.
- Multiview scene image inpainting based on conditional generative adversarial networks. IEEE Transactions on Intelligent Vehicles, 5(2), June 2020.
- Arf: Artistic radiance fields, 2022.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 586–595, 2018.
- Nerflets: Local radiance fields for efficient structure-aware 3d scene representation from 2d supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Large scale image completion via co-modulated generative adversarial networks. In International Conference on Learning Representations, 2020.
- Roomdesigner: Encoding anchor-latents for style-consistent and shape-compatible indoor scene generation. In Proceedings of the International Conference on 3D Vision (3DV), 2024.
- Prior based human completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7951–7961, 2021.
- Hugs: Holistic urban 3d scene understanding via gaussian splatting, 2024.
- Open3D: A modern library for 3D data processing. arXiv:1801.09847, 2018.
- ProPainter: Improving propagation and transformer for video inpainting. In Proceedings of IEEE International Conference on Computer Vision (ICCV), 2023.
- Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields. arXiv preprint arXiv:2312.03203, 2023.
- Progressive temporal feature alignment network for video inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16448–16457, 2021.