FreeVS: Generative View Synthesis on Free Driving Trajectory (2410.18079v1)
Abstract: Existing reconstruction-based novel view synthesis methods for driving scenes focus on synthesizing camera views along the recorded trajectory of the ego vehicle. Their image rendering performance will severely degrade on viewpoints falling out of the recorded trajectory, where camera rays are untrained. We propose FreeVS, a novel fully generative approach that can synthesize camera views on free new trajectories in real driving scenes. To control the generation results to be 3D consistent with the real scenes and accurate in viewpoint pose, we propose the pseudo-image representation of view priors to control the generation process. Viewpoint transformation simulation is applied on pseudo-images to simulate camera movement in each direction. Once trained, FreeVS can be applied to any validation sequences without reconstruction process and synthesis views on novel trajectories. Moreover, we propose two new challenging benchmarks tailored to driving scenes, which are novel camera synthesis and novel trajectory synthesis, emphasizing the freedom of viewpoints. Given that no ground truth images are available on novel trajectories, we also propose to evaluate the consistency of images synthesized on novel trajectories with 3D perception models. Experiments on the Waymo Open Dataset show that FreeVS has a strong image synthesis performance on both the recorded trajectories and novel trajectories. Project Page: https://freevs24.github.io/
- Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5470–5479, 2022.
- Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127, 2023.
- Generative novel view synthesis with 3d-aware diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4217–4229, 2023.
- Periodic vibration gaussian: Dynamic urban scene reconstruction and real-time rendering. arXiv preprint arXiv:2311.18561, 2023.
- Streetsurf: Extending multi-view implicit surface reconstruction to street views. arXiv preprint arXiv:2306.04988, 2023.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Gaia-1: A generative world model for autonomous driving. arXiv preprint arXiv:2309.17080, 2023.
- Let-3d-ap: Longitudinal error tolerant 3d average precision for camera-only 3d detection. arXiv preprint arXiv:2206.07705, 2022.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), 2023.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Neural scene rasterization for large scene rendering in real time. In ICCV, 2023a.
- Zero-1-to-3: Zero-shot one image to 3d object. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 9298–9309, 2023b.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022, 2021.
- A convnet for the 2020s. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11976–11986, 2022.
- Urban radiance field representation with deformable neural mesh primitives. ICCV, 2023a.
- Wovogen: World volume-aware diffusion for controllable multi-camera driving scene generation. arXiv preprint arXiv:2312.02934, 2023b.
- Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In 3DV, 2024.
- Realfusion: 360deg reconstruction of any object from a single image. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8446–8455, 2023.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Neural scene graphs for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2856–2865, 2021.
- Neural point light fields. CVPR, 2022.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Urban radiance fields. In CVPR, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, 2022.
- Maximilian Seitzer. pytorch-fid: FID Score for PyTorch. https://github.com/mseitzer/pytorch-fid, August 2020. Version 0.3.0.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
- Scalability in perception for autonomous driving: Waymo open dataset. In CVPR, 2020.
- Block-nerf: Scalable large scene neural view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8248–8258, 2022.
- Neurad: Neural rendering for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14895–14904, 2024.
- Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In CVPR, 2022.
- Suds: Scalable urban dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12375–12385, 2023.
- MV-FCOS3D++: Multi-View camera-only 4d object detection with pretrained monocular backbones. arXiv preprint, 2022.
- Drivedreamer: Towards real-world-driven world models for autonomous driving. arXiv preprint arXiv:2309.09777, 2023a.
- Driving into the future: Multiview visual forecasting and planning with world model for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14749–14759, 2024.
- Neural fields meet explicit geometric representations for inverse rendering of urban scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8370–8380, 2023b.
- Mapnerf: Incorporating map priors into neural radiance fields for driving view simulation. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7082–7088. IEEE, 2023a.
- 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528, 2023b.
- Reconfusion: 3d reconstruction with diffusion priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21551–21561, 2024.
- Mars: An instance-aware, modular and realistic simulator for autonomous driving. CICAI, 2023c.
- S-nerf: Neural radiance fields for street views. arXiv preprint arXiv:2303.00749, 2023.
- Second: Sparsely embedded convolutional detection. Sensors, 18(10):3337, 2018.
- Street gaussians: Modeling dynamic urban scenes with gaussian splatting. In ECCV, 2024.
- Emernerf: Emergent spatial-temporal scene decomposition via self-supervision. arXiv preprint arXiv:2311.02077, 2023a.
- Generalized predictive model for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14662–14672, 2024.
- Unisim: A neural closed-loop sensor simulator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1389–1399, 2023b.
- Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. arXiv preprint arXiv 2310.10642, 2023c.
- Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101, 2023d.
- Viewcrafter: Taming video diffusion models for high-fidelity novel view synthesis. arXiv preprint arXiv:2409.02048, 2024a.
- Sgd: Street view synthesis with gaussian splatting and diffusion prior. arXiv preprint arXiv:2403.20079, 2024b.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586–595, 2018.
- Hugs: Holistic urban 3d scene understanding via gaussian splatting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21336–21345, 2024a.
- Drivinggaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21634–21643, 2024b.
- Qitai Wang (4 papers)
- Lue Fan (26 papers)
- Yuqi Wang (62 papers)
- Yuntao Chen (37 papers)
- Zhaoxiang Zhang (162 papers)