Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Neural Implicit through Volume Rendering with Attentive Depth Fusion Priors (2310.11598v2)

Published 17 Oct 2023 in cs.CV

Abstract: Learning neural implicit representations has achieved remarkable performance in 3D reconstruction from multi-view images. Current methods use volume rendering to render implicit representations into either RGB or depth images that are supervised by multi-view ground truth. However, rendering a view each time suffers from incomplete depth at holes and unawareness of occluded structures from the depth supervision, which severely affects the accuracy of geometry inference via volume rendering. To resolve this issue, we propose to learn neural implicit representations from multi-view RGBD images through volume rendering with an attentive depth fusion prior. Our prior allows neural networks to perceive coarse 3D structures from the Truncated Signed Distance Function (TSDF) fused from all depth images available for rendering. The TSDF enables accessing the missing depth at holes on one depth image and the occluded parts that are invisible from the current view. By introducing a novel attention mechanism, we allow neural networks to directly use the depth fusion prior with the inferred occupancy as the learned implicit function. Our attention mechanism works with either a one-time fused TSDF that represents a whole scene or an incrementally fused TSDF that represents a partial scene in the context of Simultaneous Localization and Mapping (SLAM). Our evaluations on widely used benchmarks including synthetic and real-world scans show our superiority over the latest neural implicit methods. Project page: https://machineperceptionlab.github.io/Attentive_DF_Prior/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (99)
  1. SAL: Sign agnostic learning of shapes from raw data. In IEEE Conference on Computer Vision and Pattern Recognition, 2020.
  2. SALD: sign agnostic learning with derivatives. In International Conference on Learning Representations, 2021.
  3. Neural rgb-d surface reconstruction. In IEEE Conference on Computer Vision and Pattern Recognition, pages 6290–6301, 2022.
  4. Reconstructing surfaces for sparse point clouds with on-surface priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  5. Neural-pull: Learning signed distance functions from point clouds by learning to pull space onto surfaces. In International Conference on Machine Learning (ICML), 2021.
  6. Transformerfusion: Monocular rgb scene reconstruction using transformers. Advances in Neural Information Processing Systems, 2021.
  7. Instructpix2pix: Learning to follow image editing instructions. In CVPR, 2023.
  8. Deep local shapes: Learning local SDF priors for detailed 3D reconstruction. In European Conference on Computer Vision, volume 12374, pages 608–625, 2020.
  9. Unsupervised inference of signed distance functions from single sparse point clouds without learning priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  10. Latent partition implicit with surface codes for 3d representation. In European Conference on Computer Vision, 2022.
  11. Gridpull: Towards scalability in learning implicit representations from 3d point clouds. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2023.
  12. Mobilenerf: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. In The Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  13. Scannet: Richly-annotated 3d reconstructions of indoor scenes. CoRR, abs/1702.04405, 2017.
  14. SG-NN: Sparse generative neural networks for self-supervised scene completion of rgb-d scans. ArXiv, abs/1912.00036, 2019.
  15. Geo-Neus: Geometry-consistent neural implicit surfaces learning for multi-view reconstruction. In Advances in Neural Information Processing Systems, 2022.
  16. Implicit geometric regularization for learning shapes. In International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 3789–3799, 2020.
  17. Neural 3d scene reconstruction with the manhattan-world assumption. In IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  18. 3dgen: Triplane latent diffusion for textured mesh generation, 2023.
  19. Neural implicit dense semantic slam, 2023.
  20. Seqxy2seqz: Structure learning for 3d shapes by sequentially predicting 1d occupancy segments from 2d coordinates, 2020.
  21. Instruct-nerf2nerf: Editing 3d scenes with instructions, 2023.
  22. Prompt-to-prompt image editing with cross attention control. 2022.
  23. Di-fusion: Online implicit 3d reconstruction with deep priors, 2021.
  24. Local implicit grid representations for 3D scenes. In IEEE Conference on Computer Vision and Pattern Recognition, 2020.
  25. Depth-neus: Neural implicit surfaces learning for multi-view reconstruction based on depth information optimization. 2023.
  26. Coordinate quantized neural implicit representations for multi-view 3d reconstruction. In IEEE International Conference on Computer Vision, 2023.
  27. SDFDiff: Differentiable rendering of signed distance fields for 3D shape optimization. In IEEE Conference on Computer Vision and Pattern Recognition, 2020.
  28. Flexible Techniques for Differentiable Rendering with 3D Gaussians, 2023.
  29. TANDEM: tracking and dense mapping in real-time using deep multi-view stereo. In Aleksandra Faust, David Hsu, and Gerhard Neumann, editors, Conference on Robot Learning, volume 164, pages 34–45, 2021.
  30. vmap: Vectorised object mapping for neural field slam. arXiv preprint arXiv:2302.01838, 2023.
  31. A. Laurentini. The visual hull concept for silhouette-based image understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(2):150–162, 1994.
  32. Fastsurf: Fast neural rgb-d surface reconstruction using per-frame intrinsic refinement and tsdf fusion prior learning. arXiv preprint arXiv:2303.04508, 2023.
  33. Rgbd2: Generative scene synthesis via incremental view inpainting using rgbd diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023.
  34. Learning deep implicit functions for 3d shapes with dynamic code clouds. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  35. Neuralangelo: High-fidelity neural surface reconstruction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  36. Helixsurf: A robust and efficient neural implicit surface learning of indoor scenes with iterative intertwined regularization, 2023.
  37. Peter Liepa. Filling holes in meshes. In Proceedings of the 2003 Eurographics/ACM SIGGRAPH Symposium on Geometry Processing, SGP ’03, page 200–205, Goslar, DEU, 2003. Eurographics Association.
  38. SDF-SRN: Learning signed distance 3D object reconstruction from static images. In Advances in Neural Information Processing Systems, 2020.
  39. Learning to infer implicit surfaces without 3D supervision. In Advances in Neural Information Processing Systems, 2019.
  40. DIST: Rendering deep implicit signed distance function with differentiable sphere tracing. In IEEE Conference on Computer Vision and Pattern Recognition, 2020.
  41. Deep implicit moving least-squares functions for 3D reconstruction. In IEEE Conference on Computer Vision and Pattern Recognition, 2021.
  42. Marching cubes: A high resolution 3D surface construction algorithm. Computer Graphics, 21(4):163–169, 1987.
  43. Neural-pull: Learning signed distance functions from point clouds by learning to pull space onto surfaces. In International Conference on Machine Learning, 2021.
  44. Learning signed distance functions from noisy 3d point clouds via noise to noise mapping. In International Conference on Machine Learning (ICML), 2023.
  45. Surface reconstruction from point clouds by learning predictive context priors. In IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  46. Towards better gradient consistency for neural signed distance functions via level set alignment. In Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  47. ACORN: adaptive coordinate networks for neural scene representation. CoRR, abs/2105.02788, 2021.
  48. NeRF: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision, 2020.
  49. Instant neural graphics primitives with a multiresolution hash encoding. arXiv:2201.05989, 2022.
  50. Extracting Triangular 3D Models, Materials, and Lighting From Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8280–8290, June 2022.
  51. Dtam: Dense tracking and mapping in real-time. In International Conference on Computer Vision, pages 2320–2327, 2011.
  52. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In IEEE Conference on Computer Vision and Pattern Recognition, 2020.
  53. UNISURF: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In International Conference on Computer Vision, 2021.
  54. Nerfies: Deformable neural radiance fields. IEEE International Conference on Computer Vision, 2021.
  55. Convolutional occupancy networks. In European Conference on Computer Vision, 2020.
  56. Dreamfusion: Text-to-3d using 2d diffusion. arXiv, 2022.
  57. Permutosdf: Fast multi-view reconstruction with implicit surfaces using permutohedral lattices. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  58. Adop: Approximate differentiable one-pixel point rendering. arXiv:2110.06635, 2021.
  59. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation, 2023.
  60. Uncle-slam: Uncertainty learning for dense neural slam, 2023.
  61. Plenoxels: Radiance fields without neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  62. Structure-from-motion revisited. In IEEE Conference on Computer Vision and Pattern Recognition, 2016.
  63. Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision, 2016.
  64. Scene representation networks: Continuous 3D-structure-aware neural scene representations. In Advances in Neural Information Processing Systems, 2019.
  65. The replica dataset: A digital replica of indoor spaces. CoRR, abs/1906.05797, 2019.
  66. A benchmark for the evaluation of rgb-d slam systems. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 573–580, 2012.
  67. imap: Implicit mapping and positioning in real-time. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6229–6238, 2021.
  68. NeuralRecon: Real-time coherent 3D reconstruction from monocular video. IEEE Conference on Computer Vision and Pattern Recognition, 2021.
  69. Neural geometric level of detail: Real-time rendering with implicit 3D shapes. In IEEE Conference on Computer Vision and Pattern Recognition, 2021.
  70. SA-ConvONet: Sign-agnostic optimization of convolutional occupancy networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021.
  71. Delicate textured mesh recovery from nerf via adaptive surface refinement. arXiv preprint arXiv:2303.02091, 2022.
  72. DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras. Advances in neural information processing systems, 2021.
  73. Rgb-d mapping and tracking in a plenoxel radiance field, 2023.
  74. Differentiable signed distance function rendering. ACM Transactions on Graphics, 41(4):125:1–125:18, 2022.
  75. Co-slam: Joint coordinate and sparse parametric encodings for neural real-time slam, 2023.
  76. Go-surf: Neural feature grid optimization for fast, high-fidelity rgb-d surface reconstruction. In International Conference on 3D Vision, 2022.
  77. NeuRIS: Neural reconstruction of indoor scenes using normal priors. In European Conference on Computer Vision, 2022.
  78. NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In Advances in Neural Information Processing Systems, pages 27171–27183, 2021.
  79. HF-NeuS: Improved surface reconstruction using high-frequency details. 2022.
  80. Multiview compressive coding for 3d reconstruction, 2023.
  81. DFR: differentiable function rendering for learning 3D generation from images. Computer Graphics Forum, 39(5):241–252, 2020.
  82. Multi-modal neural radiance field for monocular dense slam with a light-weight tof sensor. In International Conference on Computer Vision (ICCV), 2023.
  83. Dynamic voxel grid optimization for high-fidelity rgb-d supervised surface reconstruction, 2023.
  84. Vox-fusion: Dense tracking and mapping with voxel-based neural implicit representation. In 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Dec 2022.
  85. Mvsnet: Depth inference for unstructured multi-view stereo. European Conference on Computer Vision, 2018.
  86. Volume rendering of neural implicit surfaces. In Advances in Neural Information Processing Systems, 2021.
  87. Bakedsdf: Meshing neural sdfs for real-time view synthesis. arXiv, 2023.
  88. Multiview neural surface reconstruction by disentangling geometry and appearance. Advances in Neural Information Processing Systems, 33, 2020.
  89. MonoSDF: Exploring monocular geometric cues for neural implicit surface reconstruction. ArXiv, abs/2022.00665, 2022.
  90. Autolabeling 3D objects with differentiable rendering of sdf shape priors. In IEEE Conference on Computer Vision and Pattern Recognition, 2020.
  91. 3dmatch: Learning local geometric descriptors from rgb-d reconstructions. In CVPR, 2017.
  92. Go-slam: Global optimization for consistent 3d instant reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023.
  93. Sign-agnostic implicit learning of surface self-similarities for shape modeling and reconstruction from raw point clouds. CoRR, abs/2012.07498, 2020.
  94. Learning a more continuous zero level set in unsigned distance fields through level set projection. In Proceedings of the IEEE/CVF international conference on computer vision, 2023.
  95. Learning consistency-aware unsigned distance functions progressively from raw point clouds. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  96. Unsupervised learning of depth and ego-motion from video. In IEEE Conference on Computer Vision and Pattern Recognition, pages 6612–6619, 2017.
  97. NICER-SLAM: neural implicit scene encoding for RGB SLAM. CoRR, abs/2302.03594, 2023.
  98. Nice-slam: Neural implicit scalable encoding for slam. In IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  99. Mononeuralfusion: Online monocular neural 3d reconstruction with geometric priors. CoRR, abs/2209.15153, 2022.
Citations (10)

Summary

We haven't generated a summary for this paper yet.