GaussianObject: High-Quality 3D Object Reconstruction from Four Views with Gaussian Splatting (2402.10259v4)
Abstract: Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or highly compressed object information as view coverage is insufficient. To tackle these challenges, we propose GaussianObject, a framework to represent and render the 3D object with Gaussian splatting that achieves high rendering quality with only 4 input images. We first introduce techniques of visual hull and floater elimination, which explicitly inject structure priors into the initial optimization process to help build multi-view consistency, yielding a coarse 3D Gaussian representation. Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined. We design a self-generating strategy to obtain image pairs for training the repair model. We further design a COLMAP-free variant, where pre-given accurate camera poses are not required, which achieves competitive quality and facilitates wider applications. GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, OpenIllumination, and our-collected unposed images, achieving superior performance from only four views and significantly outperforming previous SOTA methods. Our demo is available at https://gaussianobject.github.io/, and the code has been released at https://github.com/GaussianObject/GaussianObject.
- Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), 5460–5469. https://api.semanticscholar.org/CorpusID:244488448
- Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288 (2023).
- Segment Anything in 3D with NeRFs. In NeurIPS.
- GeNVS: Generative novel view synthesis with 3D-aware diffusion models.
- Jonathan Chang. 2023. minLoRA. https://github.com/cccntu/minLoRA.
- Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 22246–22256.
- Depth-Regularized Optimization for 3D Gaussian Splatting in Few-Shot Images. arXiv:2311.13398 (Jan. 2024). https://doi.org/10.48550/arXiv.2311.13398
- Objaverse-XL: A Universe of 10M+ 3D Objects. arXiv preprint arXiv:2307.05663 (2023).
- Depth-supervised NeRF: Fewer Views and Faster Training for Free. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12872–12881. https://doi.org/10.1109/CVPR52688.2022.01254
- Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 8780–8794. https://proceedings.neurips.cc/paper_files/paper/2021/file/49ad23d1ec9fa4bd8d77d02681df5cfa-Paper.pdf
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations. https://openreview.net/forum?id=YicbFdNTTy
- Vector Quantized Diffusion Model for Text-to-Image Synthesis. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10686–10696. https://doi.org/10.1109/CVPR52688.2022.01043
- threestudio: A unified framework for 3D content generation. https://github.com/threestudio-project/threestudio.
- Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions. In Proceedings of the IEEE/CVF International Conference on Computer Vision.
- Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
- Lrm: Large reconstruction model for single image to 3d. arXiv preprint arXiv:2311.04400 (2023).
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
- Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 5865–5874. https://doi.org/10.1109/ICCV48922.2021.00583
- Wonbong Jang and Lourdes Agapito. 2023. NViST: In the Wild New View Synthesis from a Single Image with Transformers. (2023). arXiv:2312.08568 [cs.CV]
- LEAP: Liberate Sparse-view 3D Modeling from Camera Poses. arXiv preprint arXiv:2310.01410 (2023).
- Elucidating the Design Space of Diffusion-Based Generative Models. In Proc. NeurIPS.
- 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics 42, 4 (2023).
- Infonerf: Ray entropy minimization for few-shot neural volume rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12912–12921.
- Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1312.6114
- Segment anything. arXiv preprint arXiv:2304.02643 (2023).
- A. Laurentini. 1994. The visual hull concept for silhouette-based image understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 2 (1994), 150–162. https://doi.org/10.1109/34.273735
- Magic3D: High-Resolution Text-to-3D Content Creation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- OpenIllumination: A Multi-Illumination Dataset for Inverse Rendering Evaluation on Real Objects. NeuRIPS 2023.
- Zero-1-to-3: Zero-shot One Image to 3D Object. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 9298–9309.
- Deceptive-NeRF: Enhancing NeRF Reconstruction using Pseudo-Observations from Diffusion Models. arXiv preprint arXiv:2305.15171 (2023).
- Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In International Conference on Learning Representations. https://openreview.net/forum?id=Bkg6RiCqY7
- DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 5775–5787. https://proceedings.neurips.cc/paper_files/paper/2022/file/260a14acce2a89dad36adc8eefe7c59e-Paper-Conference.pdf
- Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021).
- Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12663–12673. https://doi.org/10.1109/CVPR52729.2023.01218
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV (Lecture Notes in Computer Science, Vol. 12346). Springer, 405–421. https://doi.org/10.1007/978-3-030-58452-8_24
- RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5470–5480. https://doi.org/10.1109/CVPR52688.2022.00540
- Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. arXiv preprint arXiv:2106.13228 (2021).
- Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022).
- Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning. PMLR, 8748–8763. https://proceedings.mlr.press/v139/radford21a.html
- Vision Transformers for Dense Prediction. ICCV (2021).
- Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE transactions on pattern analysis and machine intelligence 44, 3 (2020), 1623–1637.
- Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 3 (2022).
- Dense depth priors for neural radiance fields from sparse input views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12892–12901.
- High-Resolution Image Synthesis with Latent Diffusion Models. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10674–10685. https://doi.org/10.1109/CVPR52688.2022.01042
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22500–22510.
- Johannes Lutz Schönberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revisited. In Conference on Computer Vision and Pattern Recognition (CVPR).
- Flipnerf: Flipped reflection rays for few-shot novel view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 22883–22893.
- Control4D: Efficient 4D Portrait Editing with Text. (2023).
- ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining. CoRR abs/2312.09249 (2023). https://doi.org/10.48550/ARXIV.2312.09249 arXiv:2312.09249
- MVDream: Multi-view Diffusion for 3D Generation. arXiv:2308.16512 (2023).
- Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 37), Francis Bach and David Blei (Eds.). PMLR, Lille, France, 2256–2265. https://proceedings.mlr.press/v37/sohl-dickstein15.html
- SimpleNeRF: Regularizing Sparse Input Neural Radiance Fields with Simpler Solutions. In SIGGRAPH Asia 2023 Conference Papers (SA ’23). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3610548.3618188
- Nagabhushan Somraj and Rajiv Soundararajan. 2023. ViP-NeRF: Visibility Prior for Sparse Input Neural Radiance Fields. (August 2023). https://doi.org/10.1145/3588432.3591539
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020).
- DaRF: Boosting Radiance Fields from Sparse Inputs with Monocular Depth Adaptation. arXiv:2305.19201 (Sept. 2023). https://doi.org/10.48550/arXiv.2305.19201 arXiv:2305.19201 [cs].
- Yang Song and Stefano Ermon. 2019. Generative Modeling by Estimating Gradients of the Data Distribution. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/file/3001ef257407d5a371a96dcd947c7d93-Paper.pdf
- Score-Based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Representations. https://openreview.net/forum?id=PxTIG12RRHS
- Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction. In CVPR.
- DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation. arXiv preprint arXiv:2309.16653 (2023).
- SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis. arXiv:2303.16196 (Aug. 2023). https://doi.org/10.48550/arXiv.2303.16196 arXiv:2303.16196 [cs].
- Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12619–12629. https://doi.org/10.1109/CVPR52729.2023.01214
- PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction. arXiv preprint arXiv:2311.12024 (2023).
- ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation. In Advances in Neural Information Processing Systems (NeurIPS).
- SynSin: End-to-end View Synthesis from a Single Image. In CVPR.
- ReconFusion: 3D Reconstruction with Diffusion Priors. arXiv preprint arXiv:2312.02981 (2023).
- OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), 803–814. https://api.semanticscholar.org/CorpusID:255998491
- Jamie Wynn and Daniyar Turmukhambetov. 2023. DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4180–4189. https://doi.org/10.1109/CVPR52729.2023.00407
- SparseGS: Real-Time 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT Sparse View Synthesis using Gaussian Splatting. arXiv:2312.00206 (Nov. 2023). https://doi.org/10.48550/arXiv.2312.00206
- SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image. In Computer Vision – ECCV 2022 (Lecture Notes in Computer Science), Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Springer Nature Switzerland, Cham, 736–753. https://doi.org/10.1007/978-3-031-20047-2_42
- Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5438–5448.
- DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model. arXiv:2311.09217 [cs.CV]
- FreeNeRF: Improving Few-Shot Neural Rendering with Free Frequency Regularization. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8254–8263. https://doi.org/10.1109/CVPR52729.2023.00798
- GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models. arXiv preprint arXiv:2310.08529 (2023).
- Differentiable Surface Splatting for Point-based Geometry Processing. ACM Transactions on Graphics (proceedings of ACM SIGGRAPH ASIA) 38, 6 (2019).
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3836–3847.
- ControlNet-v1-1-nightly. https://github.com/lllyasviel/ControlNet-v1-1-nightly.
- Differentiable Point-Based Radiance Fields for Efficient View Synthesis. In SIGGRAPH Asia 2022 Conference Papers (SA ’22). Association for Computing Machinery, New York, NY, USA, Article 7, 12 pages.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.
- Zhizhuo Zhou and Shubham Tulsiani. 2023. SparseFusion: Distilling View-Conditioned Diffusion for 3D Reconstruction. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12588–12597. https://doi.org/10.1109/CVPR52729.2023.01211
- Junzhe Zhu and Peiye Zhuang. 2023. HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance. arXiv:2305.18766 [cs.CV]
- FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting. arXiv preprint arXiv:2312.00451 (2023).
- Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers. arXiv:2312.09147 [cs.CV]