OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation (2404.15891v4)
Abstract: Recent advancements in 3D reconstruction technologies have paved the way for high-quality and real-time rendering of complex 3D scenes. Despite these achievements, a notable challenge persists: it is difficult to precisely reconstruct specific objects from large scenes. Current scene reconstruction techniques frequently result in the loss of object detail textures and are unable to reconstruct object portions that are occluded or unseen in views. To address this challenge, we delve into the meticulous 3D reconstruction of specific objects within large scenes and propose a framework termed OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation. Specifically, we proposed a novel 3D target segmentation technique based on 2D Gaussian Splatting, which segments 3D consistent target masks in multi-view scene images and generates a preliminary target model. Moreover, to reconstruct the unseen portions of the target, we propose a novel target replenishment technique driven by large-scale generative diffusion priors. We demonstrate that our method can accurately reconstruct specific targets from large scenes, both quantitatively and qualitatively. Our experiments show that OMEGAS significantly outperforms existing reconstruction methods across various scenarios. Our project page is at: https://github.com/CrystalWlz/OMEGAS
- Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5855–5864.
- Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5470–5479.
- Computational design in architecture: Defining parametric, generative, and algorithmic design. Frontiers of Architectural Research 9, 2 (2020), 287–300.
- Concept-centric Personalization with Large-scale Diffusion Priors. ArXiv abs/2312.08195 (2023). https://api.semanticscholar.org/CorpusID:266191061
- Controllable Generation with Text-to-Image Diffusion Models: A Survey. ArXiv abs/2403.04279 (2024). https://api.semanticscholar.org/CorpusID:268264822
- Segment Anything in 3D with NeRFs. In NeurIPS.
- Tracking anything with decoupled video segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1316–1326.
- Improving neural implicit surfaces geometry with patch warping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6260–6269.
- SCF-Net: Learning spatial contextual features for large-scale point cloud segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14504–14513.
- Multi-view stereo for community photo collections. In 2007 IEEE 11th International Conference on Computer Vision. IEEE, 1–8.
- Antoine Gu’edon and Vincent Lepetit. 2023. SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering. ArXiv abs/2311.12775 (2023). https://api.semanticscholar.org/CorpusID:265308825
- Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions. In ICCV.
- Poisson surface reconstruction. In Proceedings of the fourth Eurographics symposium on Geometry processing, Vol. 7.
- Segment Anything in High Quality. In NeurIPS.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics 42, 4 (2023), 1–14.
- LERF: Language Embedded Radiance Fields. In ICCV.
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014). https://api.semanticscholar.org/CorpusID:6628106
- Segment anything. In ICCV.
- Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG) 36, 4 (2017), 1–13.
- Neuralangelo: High-fidelity neural surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8456–8465.
- Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499 (2023).
- Humangaussian: Text-driven 3d human generation with gaussian splatting. arXiv preprint arXiv:2311.17061 (2023).
- William E. Lorensen and Harvey E. Cline. 1987. Marching cubes: A high resolution 3D surface construction algorithm. Proceedings of the 14th annual conference on Computer graphics and interactive techniques (1987). https://api.semanticscholar.org/CorpusID:15545924
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.
- SPIn-NeRF: Multiview segmentation and perceptual inpainting with neural radiance fields. In CVPR.
- Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5589–5599.
- Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022).
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems 35 (2022), 36479–36494.
- Springer handbook of robotics. Vol. 200. Springer.
- Photo tourism: exploring photo collections in 3D. In ACM siggraph 2006 papers. 835–846.
- Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653 (2023).
- Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12619–12629.
- Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689 (2021).
- Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. Advances in Neural Information Processing Systems 36 (2024).
- 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528 (2023).
- Augmented reality and virtual reality displays: emerging technologies and future perspectives. Light: Science & Applications 10, 1 (2021), 1–30.
- Squeezesegv3: Spatially-adaptive convolution for efficient point-cloud segmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII 16. Springer, 1–19.
- Neumesh: Learning disentangled neural mesh-based implicit field for geometry and texture editing. In European Conference on Computer Vision. Springer, 597–614.
- Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101 (2023).
- Gaussian Grouping: Segment and Edit Anything in 3D Scenes. ArXiv abs/2312.00732 (2023). https://api.semanticscholar.org/CorpusID:265551523
- Gaussiandreamer: Fast generation from text to 3d gaussian splatting with point cloud priors. arXiv preprint arXiv:2310.08529 (2023).
- pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4578–4587.