Self-supervised novel 2D view synthesis of large-scale scenes with efficient multi-scale voxel carving (2306.14709v1)
Abstract: The task of generating novel views of real scenes is increasingly important nowadays when AI models become able to create realistic new worlds. In many practical applications, it is important for novel view synthesis methods to stay grounded in the physical world as much as possible, while also being able to imagine it from previously unseen views. While most current methods are developed and tested in virtual environments with small scenes and no errors in pose and depth information, we push the boundaries to the real-world domain of large scales in the new context of UAVs. Our algorithmic contributions are two folds. First, we manage to stay anchored in the real 3D world, by introducing an efficient multi-scale voxel carving method, which is able to accommodate significant noises in pose, depth, and illumination variations, while being able to reconstruct the view of the world from drastically different poses at test time. Second, our final high-resolution output is efficiently self-trained on data automatically generated by the voxel carving module, which gives it the flexibility to adapt efficiently to any scene. We demonstrated the effectiveness of our method on highly complex and large-scale scenes in real environments while outperforming the current state-of-the-art. Our code is publicly available: https://github.com/onorabil/MSVC.
- Alexandru O Balan. Voxel carving and coloring-constructing a 3d model of an object from 2d images. Brown University, Providence RI, 2003.
- Local-to-global registration for bundle-adjusting neural radiance fields. arXiv preprint arXiv:2211.11505, 2022.
- Plenoxels: Radiance fields without neural networks. In CVPR, 2022.
- Alicevision meshroom: An open-source 3d reconstruction pipeline. In Proceedings of the 12th ACM Multimedia Systems Conference, pages 241–247, 2021.
- Flowformer: A transformer architecture for optical flow. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVII, pages 668–685. Springer, 2022.
- Efficient structure from motion for large-scale uav images: A review and a comparison of sfm tools. ISPRS Journal of Photogrammetry and Remote Sensing, 167:230–251, 2020.
- A theory of shape by space carving. International journal of computer vision, 38(3):199–218, 2000.
- Nerfacc: A general nerf accleration toolbox. arXiv preprint arXiv:2210.04847, 2022.
- Ufo depth: Unsupervised learning with flow-based odometry optimization for metric depth estimation. In 2022 International Conference on Robotics and Automation (ICRA), pages 6526–6532. IEEE, 2022.
- Progressively optimized local radiance fields for robust view synthesis. In CVPR, 2023.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
- Instant neural graphics primitives with a multiresolution hash encoding. arXiv preprint arXiv:2201.05989, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
- Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Photorealistic scene reconstruction by voxel coloring. International journal of computer vision, 35:151–173, 1999.
- Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5459–5469, 2022.
- Block-nerf: Scalable large scene neural view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8248–8258, 2022.
- Bundle adjustment—a modern synthesis. In Vision Algorithms: Theory and Practice: International Workshop on Vision Algorithms, pages 298–372. Springer, 2000.
- Suds: Scalable urban dynamic scenes, 2023.
- Grid-guided neural radiance fields for large urban scenes, 2023.
- Nerflets: Local radiance fields for efficient structure-aware 3d scene representation from 2d supervisio. arXiv preprint arXiv:2303.03361, 2023.