Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NoPose-NeuS: Jointly Optimizing Camera Poses with Neural Implicit Surfaces for Multi-view Reconstruction (2312.15238v1)

Published 23 Dec 2023 in cs.CV and cs.GR

Abstract: Learning neural implicit surfaces from volume rendering has become popular for multi-view reconstruction. Neural surface reconstruction approaches can recover complex 3D geometry that are difficult for classical Multi-view Stereo (MVS) approaches, such as non-Lambertian surfaces and thin structures. However, one key assumption for these methods is knowing accurate camera parameters for the input multi-view images, which are not always available. In this paper, we present NoPose-NeuS, a neural implicit surface reconstruction method that extends NeuS to jointly optimize camera poses with the geometry and color networks. We encode the camera poses as a multi-layer perceptron (MLP) and introduce two additional losses, which are multi-view feature consistency and rendered depth losses, to constrain the learned geometry for better estimated camera poses and scene surfaces. Extensive experiments on the DTU dataset show that the proposed method can estimate relatively accurate camera poses, while maintaining a high surface reconstruction quality with 0.89 mean Chamfer distance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  2. Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV), 2016.
  3. Nerf: Representing scenes as neural radiance fields for view synthesis, 2020.
  4. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction, 2021.
  5. Volume rendering of neural implicit surfaces, 2021.
  6. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction, 2023.
  7. Nerf–: Neural radiance fields without known camera parameters, 2022.
  8. Barf: Bundle-adjusting neural radiance fields, 2021.
  9. Self-calibrating neural radiance fields, 2021.
  10. Nope-nerf: Optimising neural radiance field with no pose prior, 2023.
  11. Nerftrinsic four: An end-to-end trainable nerf jointly optimizing diverse intrinsic and extrinsic camera parameters, 2023.
  12. Learning signed distance field for multi-view surface reconstruction, 2021.
  13. Recovering fine details for neural implicit surface reconstruction, 2022.
  14. Large-scale data for multiple-view stereopsis. International Journal of Computer Vision, pages 1–16, 2016.
  15. Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction, 2022.
  16. Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8):1362–1376, 2010.
  17. Gipuma: Massively parallel multi-view stereo reconstruction. 2016.
  18. Screened poisson surface reconstruction. ACM Trans. Graph., 32(3), jul 2013.
  19. Mvsnet: Depth inference for unstructured multi-view stereo, 2018.
  20. Patchmatchnet: Learned multi-view patchmatch stereo, 2020.
  21. Visibility-aware multi-view stereo network, 2020.
  22. Transmvsnet: Global context-aware multi-view stereo network with transformers, 2021.
  23. Mvsformer: Multi-view stereo by learning robust image features and temperature-based depth, 2022.
  24. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision, 2020.
  25. Multiview neural surface reconstruction by disentangling geometry and appearance, 2020.
  26. Marching cubes: A high resolution 3d surface construction algorithm. SIGGRAPH Comput. Graph., 21(4):163–169, aug 1987.
  27. Zoedepth: Zero-shot transfer by combining relative and metric depth, 2023.
  28. Fourier features let networks learn high frequency functions in low dimensional domains, 2020.
  29. Adam: A method for stochastic optimization, 2017.
Citations (2)

Summary

  • The paper presents a joint optimization framework that refines camera poses and neural implicit surfaces for robust 3D reconstruction from multi-view images.
  • It leverages a multi-layer perceptron encoding along with multi-view feature consistency and depth losses to enhance both pose accuracy and surface fidelity.
  • Evaluations on the DTU dataset confirm that the method accurately reconstructs complex scenes without relying on pre-determined camera parameters.

Introduction to Neural Implicit Surfaces

The task of reconstructing 3D geometry from multi-view images presents significant challenges, particularly when dealing with complex structures such as non-Lambertian surfaces and thin objects. Classical Multi-view Stereo (MVS) methods have limitations in handling areas lacking clear texture, while recent neural rendering approaches impose stringent requirements on camera parameter knowledge.

Camera Pose Challenges and Innovations

In 3D geometry reconstruction, a typical condition is to have precise camera poses, which is often unachievable with casually captured images. Some innovative methods have addressed this by optimizing camera parameters alongside rendering. The paper under consideration extends this idea to Neural Implicit Surface (NeuS) models by proposing a method for simultaneous camera pose and scene geometry optimization. This approach introduces additional loss functions that cater to reinforcing the accuracy of the estimated camera positions and the fidelity of the scene's surfaces.

Methodology and Contributions

The key aspects of the proposed method involve encoding camera poses into a multi-layer perceptron (MLP), then refining these poses through multi-view feature consistency and depth losses. Qualitatively and quantitatively evaluated on the DTU dataset, the method showcases impressive accuracy in both camera pose estimation and surface reconstruction, displaying an ability to maintain high reconstruction quality while eschewing reliance on pre-determined camera parameters.

Findings and Further Potential

Demonstrated results offer a comprehensive analysis, confirming the method's proficiency in estimations and reconstructions. Although the camera pose initializations bear significant weight on the outcome, future research angles could involve refining this dependency or optimizing intrinsic camera parameters for a more complete calibration. The path paved by this method opens possibilities for diverse applications, particularly in scenarios where access to precise camera parameters is impractical.

X Twitter Logo Streamline Icon: https://streamlinehq.com