PointRecon: Online Point-based 3D Reconstruction via Ray-based 2D-3D Matching (2410.23245v2)
Abstract: We propose a novel online, point-based 3D reconstruction method from posed monocular RGB videos. Our model maintains a global point cloud representation of the scene, continuously updating the features and 3D locations of points as new images are observed. It expands the point cloud with newly detected points while carefully removing redundancies. The point cloud updates and the depth predictions for new points are achieved through a novel ray-based 2D-3D feature matching technique, which is robust against errors in previous point position predictions. In contrast to offline methods, our approach processes infinite-length sequences and provides real-time updates. Additionally, the point cloud imposes no pre-defined resolution or scene size constraints, and its unified global representation ensures view consistency across perspectives. Experiments on the ScanNet dataset show that our method achieves comparable quality among online MVS approaches. Project page: https://arthurhero.github.io/projects/pointrecon
- Transformerfusion: Monocular rgb scene reconstruction using transformers. Advances in Neural Information Processing Systems, 34:1403–1414, 2021.
- Pyramid stereo matching network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5410–5418, 2018.
- pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19457–19467, 2024.
- Point-based multi-view stereo network. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 1538–1547, 2019.
- Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. arXiv preprint arXiv:2403.14627, 2024.
- A volumetric method for building complex models from range images. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pp. 303–312, 1996.
- Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5828–5839, 2017.
- Deepvideomvs: Multi-view stereo on video with recurrent spatio-temporal fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15324–15333, 2021.
- Cvrecon: Rethinking 3d geometric feature learning for neural reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 17750–17760, 2023.
- Visfusion: Visibility-aware online 3d scene reconstruction from videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17317–17326, 2023.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Deepmvs: Learning multi-view stereopsis. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2821–2830, 2018.
- Dpsnet: End-to-end deep plane sweep stereo. arXiv preprint arXiv:1905.00538, 2019.
- Dg-recon: Depth-guided neural 3d scene reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 18184–18194, 2023.
- Learning a multi-view stereo machine. Advances in neural information processing systems, 30, 2017.
- End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE international conference on computer vision, pp. 66–75, 2017.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4):1–14, 2023.
- A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE transactions on pattern analysis and machine intelligence, 27(3):418–433, 2005.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Efficient deep learning for stereo matching. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5695–5703, 2016.
- Torchvision the machine-vision package of torch. In Proceedings of the 18th ACM international conference on Multimedia, pp. 1485–1488, 2010.
- Orb-slam: a versatile and accurate monocular slam system. IEEE transactions on robotics, 31(5):1147–1163, 2015.
- Atlas: End-to-end 3d scene reconstruction from posed images. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pp. 414–431. Springer, 2020.
- Dtam: Dense tracking and mapping in real-time. In 2011 international conference on computer vision, pp. 2320–2327. IEEE, 2011.
- Remode: Probabilistic, monocular dense reconstruction in real time. In 2014 IEEE international conference on robotics and automation (ICRA), pp. 2609–2616. IEEE, 2014.
- Simplerecon: 3d reconstruction without 3d convolutions. In European Conference on Computer Vision, pp. 1–19. Springer, 2022.
- Pixelwise view selection for unstructured multi-view stereo. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14, pp. 501–518. Springer, 2016.
- Vortx: Volumetric 3d reconstruction with transformers for voxelwise view selection and fusion. In 2021 International Conference on 3D Vision (3DV), pp. 320–330. IEEE, 2021.
- Finerecon: Depth-aware feed-forward network for detailed 3d reconstruction. arXiv preprint arXiv:2304.01480, 2023.
- Neuralrecon: Real-time coherent 3d reconstruction from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15598–15607, 2021.
- Lgm: Large multi-view gaussian model for high-resolution 3d content creation. arXiv preprint arXiv:2402.05054, 2024.
- Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras. Advances in neural information processing systems, 34:16558–16569, 2021.
- Mvdepthnet: Real-time multiview depth estimation neural network. In 2018 International conference on 3d vision (3DV), pp. 248–257. IEEE, 2018.
- Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European conference on computer vision (ECCV), pp. 767–783, 2018.
- Stereo matching by training a convolutional neural network to compare image patches. Journal of Machine Learning Research, 17, 2016.
- Gs-lrm: Large reconstruction model for 3d gaussian splatting. arXiv preprint arXiv:2404.19702, 2024.
- Go-slam: Global optimization for consistent 3d instant reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3727–3737, 2023.
- Mod-slam: Monocular dense mapping for unbounded 3d scene reconstruction. arXiv preprint arXiv:2402.03762, 2024.
- Nicer-slam: Neural implicit scene encoding for rgb slam. In 2024 International Conference on 3D Vision (3DV), pp. 42–52. IEEE, 2024.
- Autofocusformer: Image segmentation off the grid. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18227–18236, 2023.
- Long-lrm: Long-sequence large reconstruction model for wide-coverage gaussian splats. arXiv preprint 2410.12781, 2024.