Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation (2404.15891v4)

Published 24 Apr 2024 in cs.CV

Abstract: Recent advancements in 3D reconstruction technologies have paved the way for high-quality and real-time rendering of complex 3D scenes. Despite these achievements, a notable challenge persists: it is difficult to precisely reconstruct specific objects from large scenes. Current scene reconstruction techniques frequently result in the loss of object detail textures and are unable to reconstruct object portions that are occluded or unseen in views. To address this challenge, we delve into the meticulous 3D reconstruction of specific objects within large scenes and propose a framework termed OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation. Specifically, we proposed a novel 3D target segmentation technique based on 2D Gaussian Splatting, which segments 3D consistent target masks in multi-view scene images and generates a preliminary target model. Moreover, to reconstruct the unseen portions of the target, we propose a novel target replenishment technique driven by large-scale generative diffusion priors. We demonstrate that our method can accurately reconstruct specific targets from large scenes, both quantitatively and qualitatively. Our experiments show that OMEGAS significantly outperforms existing reconstruction methods across various scenarios. Our project page is at: https://github.com/CrystalWlz/OMEGAS

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5855–5864.
  2. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5470–5479.
  3. Computational design in architecture: Defining parametric, generative, and algorithmic design. Frontiers of Architectural Research 9, 2 (2020), 287–300.
  4. Concept-centric Personalization with Large-scale Diffusion Priors. ArXiv abs/2312.08195 (2023). https://api.semanticscholar.org/CorpusID:266191061
  5. Controllable Generation with Text-to-Image Diffusion Models: A Survey. ArXiv abs/2403.04279 (2024). https://api.semanticscholar.org/CorpusID:268264822
  6. Segment Anything in 3D with NeRFs. In NeurIPS.
  7. Tracking anything with decoupled video segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1316–1326.
  8. Improving neural implicit surfaces geometry with patch warping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6260–6269.
  9. SCF-Net: Learning spatial contextual features for large-scale point cloud segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14504–14513.
  10. Multi-view stereo for community photo collections. In 2007 IEEE 11th International Conference on Computer Vision. IEEE, 1–8.
  11. Antoine Gu’edon and Vincent Lepetit. 2023. SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering. ArXiv abs/2311.12775 (2023). https://api.semanticscholar.org/CorpusID:265308825
  12. Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions. In ICCV.
  13. Poisson surface reconstruction. In Proceedings of the fourth Eurographics symposium on Geometry processing, Vol. 7.
  14. Segment Anything in High Quality. In NeurIPS.
  15. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics 42, 4 (2023), 1–14.
  16. LERF: Language Embedded Radiance Fields. In ICCV.
  17. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014). https://api.semanticscholar.org/CorpusID:6628106
  18. Segment anything. In ICCV.
  19. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG) 36, 4 (2017), 1–13.
  20. Neuralangelo: High-fidelity neural surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8456–8465.
  21. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499 (2023).
  22. Humangaussian: Text-driven 3d human generation with gaussian splatting. arXiv preprint arXiv:2311.17061 (2023).
  23. William E. Lorensen and Harvey E. Cline. 1987. Marching cubes: A high resolution 3D surface construction algorithm. Proceedings of the 14th annual conference on Computer graphics and interactive techniques (1987). https://api.semanticscholar.org/CorpusID:15545924
  24. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.
  25. SPIn-NeRF: Multiview segmentation and perceptual inpainting with neural radiance fields. In CVPR.
  26. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5589–5599.
  27. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022).
  28. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
  29. Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems 35 (2022), 36479–36494.
  30. Springer handbook of robotics. Vol. 200. Springer.
  31. Photo tourism: exploring photo collections in 3D. In ACM siggraph 2006 papers. 835–846.
  32. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653 (2023).
  33. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12619–12629.
  34. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689 (2021).
  35. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. Advances in Neural Information Processing Systems 36 (2024).
  36. 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528 (2023).
  37. Augmented reality and virtual reality displays: emerging technologies and future perspectives. Light: Science & Applications 10, 1 (2021), 1–30.
  38. Squeezesegv3: Spatially-adaptive convolution for efficient point-cloud segmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII 16. Springer, 1–19.
  39. Neumesh: Learning disentangled neural mesh-based implicit field for geometry and texture editing. In European Conference on Computer Vision. Springer, 597–614.
  40. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101 (2023).
  41. Gaussian Grouping: Segment and Edit Anything in 3D Scenes. ArXiv abs/2312.00732 (2023). https://api.semanticscholar.org/CorpusID:265551523
  42. Gaussiandreamer: Fast generation from text to 3d gaussian splatting with point cloud priors. arXiv preprint arXiv:2310.08529 (2023).
  43. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4578–4587.

Summary

  • The paper introduces a novel framework that combines SAM, 3D Gaussian Splatting, and diffusion models for accurate object mesh extraction.
  • It employs consistent segmentation with identity vectors to build an initial 3D model and refines textures using large-scale diffusion priors.
  • Experimental results on challenging datasets demonstrate improved mesh quality and occlusion robustness compared to existing reconstruction methods.

OMEGAS: Enhanced Object Mesh Extraction Guided by Gaussian Segmentation

Overview of Research

OMEGAS, the proposed framework by Wang, Zhou, and Yin from Beijing University of Posts and Telecommunications, addresses critical challenges in reconstructing detailed 3D meshes of specific objects within large scenes. With the growing complexity of scene reconstruction required in fields such as virtual reality and robotics, traditional methods often struggle with preserving high-fidelity textures and reconstructing obscured or invisible parts of objects. OMEGAS integrates multiple advanced methodologies, including Segment Anything Model (SAM), 3D Gaussian Splatting (3DGS), large-scale diffusion models, and SuGaR model, to segment scenes, refine object details, and extract precise object meshes.

Methodology

Segmentation and Initial Model Construction

The initial phase involves the use of SAM integrated with 3D Gaussian Splatting for preliminary segmentation. This segmentation includes a novel approach to consistent mask production across different views using identity vectors. These vectors are then utilized to classify segments and enforce consistency through additional layers and loss functions. This initial setup forms a rudimentary 3D model of the target object.

Detail Refinement via Diffusion Models

After establishing a base model, OMEGAS employs large-scale diffusion priors to enhance texture details and reconstruct partially visible or invisible object segments. Here, random camera renderings curated via Stable Diffusion techniques optimize the model based on actual image data, focusing on photographic fidelity and detail completeness.

Mesh Extraction

In the final step, the refined 3DGS model is re-rendered and segmented to derive precise target masks and clear background data. These re-renderings, coupled with original scene views, support the SuGaR model in executing the final mesh extraction, ensuring high-quality and detailed 3D object mesh outputs.

Experiments and Results

The framework has been tested across various datasets and scenes, demonstrating significant advancements over existing methodologies. In scenarios like Tanks and Temples dataset, OMEGAS has shown superior mesh texture detail and occlusion robustness in object reconstruction compared to methods like SuGaR alone or in combination with NeRF-based models.

Implications and Future Work

The introduction of OMEGAS provides a promising solution to the longstanding challenge of high-fidelity object-specific reconstruction in complex 3D scenes. Its capability to integrate segmentation, detail refinement, and mesh extraction in one framework seamlessly contributes practically to fields like augmented reality, gaming, and large-scale 3D data generation. Future developments could explore the efficiency of the model under different scene complexities and further integration with real-time processing systems for dynamic applications.

Conclusion

The paper successfully demonstrates a methodological and practical advancement in the niche of 3D reconstruction, specifically in extracting detailed and accurate meshes of specific objects within large scenes. By innovatively combining existing tools and introducing new segmentation and optimization techniques, OMEGAS sets a new standard for mesh reconstruction that could significantly impact various technology sectors.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub