Papers
Topics
Authors
Recent
Search
2000 character limit reached

PSDF: Prior-Driven Neural Implicit Surface Learning for Multi-view Reconstruction

Published 23 Jan 2024 in cs.CV | (2401.12751v1)

Abstract: Surface reconstruction has traditionally relied on the Multi-View Stereo (MVS)-based pipeline, which often suffers from noisy and incomplete geometry. This is due to that although MVS has been proven to be an effective way to recover the geometry of the scenes, especially for locally detailed areas with rich textures, it struggles to deal with areas with low texture and large variations of illumination where the photometric consistency is unreliable. Recently, Neural Implicit Surface Reconstruction (NISR) combines surface rendering and volume rendering techniques and bypasses the MVS as an intermediate step, which has emerged as a promising alternative to overcome the limitations of traditional pipelines. While NISR has shown impressive results on simple scenes, it remains challenging to recover delicate geometry from uncontrolled real-world scenes which is caused by its underconstrained optimization. To this end, the framework PSDF is proposed which resorts to external geometric priors from a pretrained MVS network and internal geometric priors inherent in the NISR model to facilitate high-quality neural implicit surface learning. Specifically, the visibility-aware feature consistency loss and depth prior-assisted sampling based on external geometric priors are introduced. These proposals provide powerfully geometric consistency constraints and aid in locating surface intersection points, thereby significantly improving the accuracy and delicate reconstruction of NISR. Meanwhile, the internal prior-guided importance rendering is presented to enhance the fidelity of the reconstructed surface mesh by mitigating the biased rendering issue in NISR. Extensive experiments on the Tanks and Temples dataset show that PSDF achieves state-of-the-art performance on complex uncontrolled scenes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. H. Li, X. Yang, H. Zhai, Y. Liu, H. Bao, and G. Zhang, “Vox-surf: Voxel-based implicit surface representation,” IEEE Transactions on Visualization and Computer Graphics, 2022.
  2. D. Petrov, M. Gadelha, R. Měch, and E. Kalogerakis, “Anise: Assembly-based neural implicit surface reconstruction,” IEEE Transactions on Visualization and Computer Graphics, 2023.
  3. J. L. Schönberger, E. Zheng, J.-M. Frahm, and M. Pollefeys, “Pixelwise view selection for unstructured multi-view stereo,” in European Conference on Computer Vision.   Springer, 2016, pp. 501–518.
  4. Z. Wei, Q. Zhu, C. Min, Y. Chen, and G. Wang, “Bidirectional hybrid lstm based recurrent neural network for multi-view stereo,” IEEE Transactions on Visualization and Computer Graphics, 2022.
  5. J. Zhang, Y. Yao, S. Li, T. Fang, D. McKinnon, Y. Tsin, and L. Quan, “Critical regularizations for neural surface reconstruction in the wild,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 6270–6279.
  6. P. Wang, L. Liu, Y. Liu, C. Theobalt, T. Komura, and W. Wang, “Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 27 171–27 183.
  7. L. Yariv, J. Gu, Y. Kasten, and Y. Lipman, “Volume rendering of neural implicit surfaces,” in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 4805–4815.
  8. L. Yariv, Y. Kasten, D. Moran, M. Galun, M. Atzmon, B. Ronen, and Y. Lipman, “Multiview neural surface reconstruction by disentangling geometry and appearance,” Advances in Neural Information Processing Systems, vol. 33, pp. 2492–2502, 2020.
  9. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in European Conference on Computer Vision.   Springer, 2020, pp. 405–421.
  10. Z. Yu, S. Peng, M. Niemeyer, T. Sattler, and A. Geiger, “Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction,” in Advances in Neural Information Processing Systems, 2022.
  11. J. Wang, P. Wang, X. Long, C. Theobalt, T. Komura, L. Liu, and W. Wang, “Neuris: Neural reconstruction of indoor scenes using normal priors,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII.   Springer, 2022, pp. 139–155.
  12. Q. Fu, Q. Xu, Y. S. Ong, and W. Tao, “Geo-neus: Geometry-consistent neural implicit surfaces learning for multi-view reconstruction,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 3403–3416.
  13. J. Zhang, Y. Yao, and L. Quan, “Learning signed distance field for multi-view surface reconstruction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 6525–6534.
  14. F. Darmon, B. Bascle, J.-C. Devaux, P. Monasse, and M. Aubry, “Improving neural implicit surfaces geometry with patch warping,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 6260–6269.
  15. D. Chen, P. Zhang, I. Feldmann, O. Schreer, and P. Eisert, “Recovering fine details for neural implicit surface reconstruction,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 4330–4339.
  16. Y. Zhang, Z. Hu, H. Wu, M. Zhao, L. Li, Z. Zou, and C. Fan, “Towards unbiased volume rendering of neural implicit surfaces with geometry priors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4359–4368.
  17. Y. Yao, Z. Luo, S. Li, T. Fang, and L. Quan, “Mvsnet: Depth inference for unstructured multi-view stereo,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 767–783.
  18. A. Knapitsch, J. Park, Q.-Y. Zhou, and V. Koltun, “Tanks and temples: Benchmarking large-scale scene reconstruction,” ACM Transactions on Graphics (ToG), vol. 36, no. 4, pp. 1–13, 2017.
  19. H. Aanæs, R. R. Jensen, G. Vogiatzis, E. Tola, and A. B. Dahl, “Large-scale data for multiple-view stereopsis,” International Journal of Computer Vision, vol. 120, pp. 153–168, 2016.
  20. H.-H. Vu, P. Labatut, J.-P. Pons, and R. Keriven, “High accuracy and visibility-consistent dense multiview stereo,” IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 5, pp. 889–901, 2011.
  21. M. Kazhdan and H. Hoppe, “Screened poisson surface reconstruction,” ACM Transactions on Graphics (ToG), vol. 32, no. 3, pp. 1–13, 2013.
  22. B. Curless and M. Levoy, “A volumetric method for building complex models from range images,” in Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, 1996, pp. 303–312.
  23. X. Gu, Z. Fan, S. Zhu, Z. Dai, F. Tan, and P. Tan, “Cascade cost volume for high-resolution multi-view stereo and stereo matching,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2495–2504.
  24. W. Su and W. Tao, “Efficient edge-preserving multi-view stereo network for depth estimation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, 2023, pp. 2348–2356.
  25. Z. Zhang, R. Peng, Y. Hu, and R. Wang, “Geomvsnet: Learning multi-view stereo with geometry perception,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 21 508–21 518.
  26. M. Niemeyer, L. Mescheder, M. Oechsle, and A. Geiger, “Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3504–3515.
  27. M. Oechsle, S. Peng, and A. Geiger, “Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 5589–5599.
  28. T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primitives with a multiresolution hash encoding,” ACM Transactions on Graphics (ToG), vol. 41, no. 4, pp. 1–15, 2022.
  29. B. Cai, J. Huang, R. Jia, C. Lv, and H. Fu, “Neuda: Neural deformable anchor for high-fidelity implicit surface reconstruction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 8476–8485.
  30. R. A. Rosu and S. Behnke, “Permutosdf: Fast multi-view reconstruction with implicit surfaces using permutohedral lattices,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 8466–8475.
  31. Z. Li, T. Müller, A. Evans, R. H. Taylor, M. Unberath, M.-Y. Liu, and C.-H. Lin, “Neuralangelo: High-fidelity neural surface reconstruction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 8456–8465.
  32. H. Guo, S. Peng, H. Lin, Q. Wang, G. Zhang, H. Bao, and X. Zhou, “Neural 3d scene reconstruction with the manhattan-world assumption,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5511–5520.
  33. A. Eftekhar, A. Sax, J. Malik, and A. Zamir, “Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 786–10 796.
  34. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
  35. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  36. J. T. Barron, B. Mildenhall, D. Verbin, P. P. Srinivasan, and P. Hedman, “Mip-nerf 360: Unbounded anti-aliased neural radiance fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5470–5479.
  37. M. Atzmon and Y. Lipman, “Sal: Sign agnostic learning of shapes from raw data,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2565–2574.
  38. R. Martin-Brualla, N. Radwan, M. S. Sajjadi, J. T. Barron, A. Dosovitskiy, and D. Duckworth, “Nerf in the wild: Neural radiance fields for unconstrained photo collections,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 7210–7219.
  39. E. L. William and E. C. Harvey, “Marching cubes: A high resolution 3d surface construction algorithm,” ACM SIGGRAPH Computer Graphics, vol. 21, no. 4, pp. 163–169, 1987.
  40. Y. Yao, Z. Luo, S. Li, J. Zhang, Y. Ren, L. Zhou, T. Fang, and L. Quan, “Blendedmvs: A large-scale dataset for generalized multi-view stereo networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1790–1799.
  41. Q. Xu, W. Su, Y. Qi, W. Tao, and M. Pollefeys, “Learning inverse depth regression for pixelwise visibility-aware multi-view stereo networks,” International Journal of Computer Vision, vol. 130, no. 8, pp. 2040–2059, 2022.
  42. W. Su, Q. Xu, and W. Tao, “Uncertainty guided multi-view stereo network for depth estimation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 7796–7808, 2022.
  43. F. Wang, S. Galliani, C. Vogel, P. Speciale, and M. Pollefeys, “Patchmatchnet: Learned multi-view patchmatch stereo,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 14 194–14 203.
  44. K. Zhang, G. Riegler, N. Snavely, and V. Koltun, “Nerf++: Analyzing and improving neural radiance fields,” arXiv preprint arXiv:2010.07492, 2020.
  45. T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125.
Citations (2)

Summary

  • The paper proposes PSDF, a framework that integrates external and internal priors to enhance neural implicit surface reconstruction.
  • It introduces a visibility-aware consistency loss and depth-informed sampling to accurately pinpoint surface intersections.
  • Empirical tests on Tanks and Temples and DTU datasets demonstrate significant improvements in reconstruction accuracy.

Introduction

The development of Neural Implicit Surface Reconstruction (NISR) techniques has been a significant advancement in addressing challenges faced by Multi-View Stereo (MVS)-based surface reconstruction pipelines. Traditional MVS approaches struggle with areas of low texture and varying illumination, leading to noisy and incomplete geometry. Recent NISR methods show promise by directly reconstructing surface geometry through a combination of differentiable surface and volume rendering techniques. However, these methods typically fall short when dealing with complex, real-world scenes due to underconstrained optimization that focuses on global structure at the expense of fine detail.

The PSDF Framework

A newly proposed framework, PSDF, seeks to refine neural implicit surface learning by incorporating both external geometric priors from a pretrained MVS network and internal geometric priors inherent within the NISR model. The framework introduces a visibility-aware feature consistency loss and a depth prior-assisted sampling methodology. These additions offer robust geometric consistency and aid in accurately locating surface intersection points.

To combat biased rendering issues inherent in volume rendering, PSDF deploys an internal prior-guided importance rendering strategy. By harnessing densely distributed near-surface points, PSDF can direct the rendering process toward points that confer unbiased rendering effects.

Empirical Evaluation

The effectiveness of PSDF is quantitatively validated on the Tanks and Temples dataset, achieving state-of-the-art performance. Notable improvements are reported compared with other NISR methods like VolSDF, MonoSDF, Geo-Neus, and Neus, with substantial percentage gains in reconstruction accuracy. These experiments emphasize PSDF's capability to reconstruct complex scenes with high fidelity and detail. Moreover, the DTU dataset benchmarking depicts that PSDF is capable of handling detailed object-centric scenes, second only to Geo-Neus in terms of Chamfer Distance.

Contributions

PSDF marks an essential progression in the sphere of surface reconstruction by making several seminal contributions. It elevates the optimization of the geometric field, provides robust geometric consistency constraints, and enhances the fidelity of reconstructed surfaces. These advancements are established through a meticulous integration of external priors for efficient sample generation and internal priors for importance rendering.

Conclusion

The paper posits that the prior-driven neural implicit surface learning framework, PSDF, sets a new benchmark for multi-view reconstruction, especially in complex and dynamic real-world scenes. By fully exploiting external and internal geometric priors, PSDF achieves high-quality surface reconstruction. Future work may explore expediting the training process and overcoming the current limitations associated with the learning of thin structures within scenes.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 52 likes about this paper.