Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction (2403.19314v2)

Published 28 Mar 2024 in cs.CV

Abstract: Scene reconstruction from multi-view images is a fundamental problem in computer vision and graphics. Recent neural implicit surface reconstruction methods have achieved high-quality results; however, editing and manipulating the 3D geometry of reconstructed scenes remains challenging due to the absence of naturally decomposed object entities and complex object/background compositions. In this paper, we present Total-Decom, a novel method for decomposed 3D reconstruction with minimal human interaction. Our approach seamlessly integrates the Segment Anything Model (SAM) with hybrid implicit-explicit neural surface representations and a mesh-based region-growing technique for accurate 3D object decomposition. Total-Decom requires minimal human annotations while providing users with real-time control over the granularity and quality of decomposition. We extensively evaluate our method on benchmark datasets and demonstrate its potential for downstream applications, such as animation and scene editing. The code is available at https://github.com/CVMI-Lab/Total-Decom.git.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Dm-nerf: 3d scene geometry decomposition and manipulation from 2d images. In The Eleventh International Conference on Learning Representations, 2022.
  2. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
  3. Segment anything in 3d with nerfs. arXiv preprint arXiv:2304.12308, 2023.
  4. Text2tex: Text-driven texture synthesis via diffusion models. arXiv preprint arXiv:2303.11396, 2023.
  5. Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1290–1299, 2022.
  6. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017.
  7. Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10786–10796, 2021.
  8. Interactive segmentation of radiance fields. arXiv preprint arXiv:2212.13545, 2022.
  9. Implicit geometric regularization for learning shapes. In International Conference on Machine Learning, pages 3789–3799. PMLR, 2020.
  10. Neural 3d scene reconstruction with the manhattan-world assumption. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5511–5520, 2022.
  11. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
  12. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  13. Neural kernel surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4369–4379, 2023.
  14. Local implicit grid representations for 3d scenes. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020.
  15. Ray tracing volume densities. ACM SIGGRAPH computer graphics, 18(3):165–174, 1984.
  16. Lerf: Language embedded radiance fields. In International Conference on Computer Vision (ICCV), 2023.
  17. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  18. Decomposing nerf for editing via feature field distillation. arXiv preprint arXiv:2205.15585, 2022.
  19. vmap: Vectorised object mapping for neural field slam. arXiv preprint arXiv:2302.01838, 2023.
  20. Panoptic neural fields: A semantic object-aware neural scene representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12871–12881, 2022.
  21. Language-driven semantic segmentation. arXiv preprint arXiv:2201.03546, 2022.
  22. Rico: Regularizing the unobservable for indoor compositional reconstruction. In ICCV, 2023.
  23. Mars3d: A plug-and-play motion-aware model for semantic segmentation on multi-scan 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9372–9381, 2023.
  24. Learning a room with the occ-sdf hybrid: Signed distance function mingled with occupancy aids scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8940–8950, 2023.
  25. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  26. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, 2022.
  27. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5589–5599, 2021.
  28. Dinov2: Learning robust visual features without supervision, 2023.
  29. Convolutional occupancy networks. In European Conference on Computer Vision, pages 523–540. Springer, 2020.
  30. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  31. Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  32. Pixelwise View Selection for Unstructured Multi-View Stereo. In European Conference on Computer Vision (ECCV), 2016.
  33. Panoptic lifting for 3d scene understanding with neural fields. arXiv preprint arXiv:2212.09802, 2022.
  34. The Replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797, 2019.
  35. Neural feature fusion fields: 3d distillation of self-supervised 2d image representations. arXiv preprint arXiv:2209.03494, 2022.
  36. Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605, 2008.
  37. Dm-nerf: 3d scene geometry decomposition and manipulation from 2d images. arXiv preprint arXiv:2208.07227, 2022a.
  38. Neuris: Neural reconstruction of indoor scenes using normal priors. arXiv preprint arXiv:2206.13597, 2022b.
  39. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021.
  40. Object-compositional neural implicit surfaces. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVII, pages 197–213. Springer, 2022.
  41. Objectsdf++: Improved object-compositional neural implicit surfaces. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 21764–21774, 2023.
  42. Assetfield: Assets mining and reconfiguration in ground feature plane representation. arXiv preprint arXiv:2303.13953, 2023.
  43. Volume rendering of neural implicit surfaces. Advances in Neural Information Processing Systems, 34:4805–4815, 2021.
  44. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4578–4587, 2021.
  45. Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction. arXiv preprint arXiv:2206.00665, 2022.
  46. Manhattan world: Compass direction from a single image by bayesian inference. In Computer Vision, IEEE International Conference on Computer Vision, page 941, Los Alamitos, CA, USA, 1999. IEEE Computer Society.
  47. In-place scene labelling and understanding with implicit scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15838–15847, 2021.
  48. Nice-slam: Neural implicit scalable encoding for slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12786–12796, 2022.
Citations (2)

Summary

We haven't generated a summary for this paper yet.