Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GaussianObject: High-Quality 3D Object Reconstruction from Four Views with Gaussian Splatting (2402.10259v4)

Published 15 Feb 2024 in cs.CV and cs.GR

Abstract: Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or highly compressed object information as view coverage is insufficient. To tackle these challenges, we propose GaussianObject, a framework to represent and render the 3D object with Gaussian splatting that achieves high rendering quality with only 4 input images. We first introduce techniques of visual hull and floater elimination, which explicitly inject structure priors into the initial optimization process to help build multi-view consistency, yielding a coarse 3D Gaussian representation. Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined. We design a self-generating strategy to obtain image pairs for training the repair model. We further design a COLMAP-free variant, where pre-given accurate camera poses are not required, which achieves competitive quality and facilitates wider applications. GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, OpenIllumination, and our-collected unposed images, achieving superior performance from only four views and significantly outperforming previous SOTA methods. Our demo is available at https://gaussianobject.github.io/, and the code has been released at https://github.com/GaussianObject/GaussianObject.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (82)
  1. Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), 5460–5469. https://api.semanticscholar.org/CorpusID:244488448
  2. Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288 (2023).
  3. Segment Anything in 3D with NeRFs. In NeurIPS.
  4. GeNVS: Generative novel view synthesis with 3D-aware diffusion models.
  5. Jonathan Chang. 2023. minLoRA. https://github.com/cccntu/minLoRA.
  6. Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 22246–22256.
  7. Depth-Regularized Optimization for 3D Gaussian Splatting in Few-Shot Images. arXiv:2311.13398 (Jan. 2024). https://doi.org/10.48550/arXiv.2311.13398
  8. Objaverse-XL: A Universe of 10M+ 3D Objects. arXiv preprint arXiv:2307.05663 (2023).
  9. Depth-supervised NeRF: Fewer Views and Faster Training for Free. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12872–12881. https://doi.org/10.1109/CVPR52688.2022.01254
  10. Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 8780–8794. https://proceedings.neurips.cc/paper_files/paper/2021/file/49ad23d1ec9fa4bd8d77d02681df5cfa-Paper.pdf
  11. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations. https://openreview.net/forum?id=YicbFdNTTy
  12. Vector Quantized Diffusion Model for Text-to-Image Synthesis. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10686–10696. https://doi.org/10.1109/CVPR52688.2022.01043
  13. threestudio: A unified framework for 3D content generation. https://github.com/threestudio-project/threestudio.
  14. Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions. In Proceedings of the IEEE/CVF International Conference on Computer Vision.
  15. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
  16. Lrm: Large reconstruction model for single image to 3d. arXiv preprint arXiv:2311.04400 (2023).
  17. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
  18. Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 5865–5874. https://doi.org/10.1109/ICCV48922.2021.00583
  19. Wonbong Jang and Lourdes Agapito. 2023. NViST: In the Wild New View Synthesis from a Single Image with Transformers. (2023). arXiv:2312.08568 [cs.CV]
  20. LEAP: Liberate Sparse-view 3D Modeling from Camera Poses. arXiv preprint arXiv:2310.01410 (2023).
  21. Elucidating the Design Space of Diffusion-Based Generative Models. In Proc. NeurIPS.
  22. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics 42, 4 (2023).
  23. Infonerf: Ray entropy minimization for few-shot neural volume rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12912–12921.
  24. Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1312.6114
  25. Segment anything. arXiv preprint arXiv:2304.02643 (2023).
  26. A. Laurentini. 1994. The visual hull concept for silhouette-based image understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 2 (1994), 150–162. https://doi.org/10.1109/34.273735
  27. Magic3D: High-Resolution Text-to-3D Content Creation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  28. OpenIllumination: A Multi-Illumination Dataset for Inverse Rendering Evaluation on Real Objects. NeuRIPS 2023.
  29. Zero-1-to-3: Zero-shot One Image to 3D Object. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 9298–9309.
  30. Deceptive-NeRF: Enhancing NeRF Reconstruction using Pseudo-Observations from Diffusion Models. arXiv preprint arXiv:2305.15171 (2023).
  31. Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In International Conference on Learning Representations. https://openreview.net/forum?id=Bkg6RiCqY7
  32. DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 5775–5787. https://proceedings.neurips.cc/paper_files/paper/2022/file/260a14acce2a89dad36adc8eefe7c59e-Paper-Conference.pdf
  33. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021).
  34. Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12663–12673. https://doi.org/10.1109/CVPR52729.2023.01218
  35. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV (Lecture Notes in Computer Science, Vol. 12346). Springer, 405–421. https://doi.org/10.1007/978-3-030-58452-8_24
  36. RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5470–5480. https://doi.org/10.1109/CVPR52688.2022.00540
  37. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. arXiv preprint arXiv:2106.13228 (2021).
  38. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022).
  39. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning. PMLR, 8748–8763. https://proceedings.mlr.press/v139/radford21a.html
  40. Vision Transformers for Dense Prediction. ICCV (2021).
  41. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE transactions on pattern analysis and machine intelligence 44, 3 (2020), 1623–1637.
  42. Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 3 (2022).
  43. Dense depth priors for neural radiance fields from sparse input views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12892–12901.
  44. High-Resolution Image Synthesis with Latent Diffusion Models. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10674–10685. https://doi.org/10.1109/CVPR52688.2022.01042
  45. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22500–22510.
  46. Johannes Lutz Schönberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revisited. In Conference on Computer Vision and Pattern Recognition (CVPR).
  47. Flipnerf: Flipped reflection rays for few-shot novel view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 22883–22893.
  48. Control4D: Efficient 4D Portrait Editing with Text. (2023).
  49. ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining. CoRR abs/2312.09249 (2023). https://doi.org/10.48550/ARXIV.2312.09249 arXiv:2312.09249
  50. MVDream: Multi-view Diffusion for 3D Generation. arXiv:2308.16512 (2023).
  51. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 37), Francis Bach and David Blei (Eds.). PMLR, Lille, France, 2256–2265. https://proceedings.mlr.press/v37/sohl-dickstein15.html
  52. SimpleNeRF: Regularizing Sparse Input Neural Radiance Fields with Simpler Solutions. In SIGGRAPH Asia 2023 Conference Papers (SA ’23). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3610548.3618188
  53. Nagabhushan Somraj and Rajiv Soundararajan. 2023. ViP-NeRF: Visibility Prior for Sparse Input Neural Radiance Fields. (August 2023). https://doi.org/10.1145/3588432.3591539
  54. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020).
  55. DaRF: Boosting Radiance Fields from Sparse Inputs with Monocular Depth Adaptation. arXiv:2305.19201 (Sept. 2023). https://doi.org/10.48550/arXiv.2305.19201 arXiv:2305.19201 [cs].
  56. Yang Song and Stefano Ermon. 2019. Generative Modeling by Estimating Gradients of the Data Distribution. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/file/3001ef257407d5a371a96dcd947c7d93-Paper.pdf
  57. Score-Based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Representations. https://openreview.net/forum?id=PxTIG12RRHS
  58. Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction. In CVPR.
  59. DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation. arXiv preprint arXiv:2309.16653 (2023).
  60. SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis. arXiv:2303.16196 (Aug. 2023). https://doi.org/10.48550/arXiv.2303.16196 arXiv:2303.16196 [cs].
  61. Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12619–12629. https://doi.org/10.1109/CVPR52729.2023.01214
  62. PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction. arXiv preprint arXiv:2311.12024 (2023).
  63. ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation. In Advances in Neural Information Processing Systems (NeurIPS).
  64. SynSin: End-to-end View Synthesis from a Single Image. In CVPR.
  65. ReconFusion: 3D Reconstruction with Diffusion Priors. arXiv preprint arXiv:2312.02981 (2023).
  66. OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), 803–814. https://api.semanticscholar.org/CorpusID:255998491
  67. Jamie Wynn and Daniyar Turmukhambetov. 2023. DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4180–4189. https://doi.org/10.1109/CVPR52729.2023.00407
  68. SparseGS: Real-Time 360∘{}^{\circ}start_FLOATSUPERSCRIPT ∘ end_FLOATSUPERSCRIPT  Sparse View Synthesis using Gaussian Splatting. arXiv:2312.00206 (Nov. 2023). https://doi.org/10.48550/arXiv.2312.00206
  69. SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image. In Computer Vision – ECCV 2022 (Lecture Notes in Computer Science), Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Springer Nature Switzerland, Cham, 736–753. https://doi.org/10.1007/978-3-031-20047-2_42
  70. Point-nerf: Point-based neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5438–5448.
  71. DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model. arXiv:2311.09217 [cs.CV]
  72. FreeNeRF: Improving Few-Shot Neural Rendering with Free Frequency Regularization. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8254–8263. https://doi.org/10.1109/CVPR52729.2023.00798
  73. GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models. arXiv preprint arXiv:2310.08529 (2023).
  74. Differentiable Surface Splatting for Point-based Geometry Processing. ACM Transactions on Graphics (proceedings of ACM SIGGRAPH ASIA) 38, 6 (2019).
  75. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3836–3847.
  76. ControlNet-v1-1-nightly. https://github.com/lllyasviel/ControlNet-v1-1-nightly.
  77. Differentiable Point-Based Radiance Fields for Efficient View Synthesis. In SIGGRAPH Asia 2022 Conference Papers (SA ’22). Association for Computing Machinery, New York, NY, USA, Article 7, 12 pages.
  78. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.
  79. Zhizhuo Zhou and Shubham Tulsiani. 2023. SparseFusion: Distilling View-Conditioned Diffusion for 3D Reconstruction. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12588–12597. https://doi.org/10.1109/CVPR52729.2023.01211
  80. Junzhe Zhu and Peiye Zhuang. 2023. HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance. arXiv:2305.18766 [cs.CV]
  81. FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting. arXiv preprint arXiv:2312.00451 (2023).
  82. Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers. arXiv:2312.09147 [cs.CV]
Citations (2)

Summary

  • The paper introduces the GaussianObject framework, which uses Gaussian splatting to enable high-quality 3D reconstructions from only four views.
  • It employs a visual hull-based Gaussian initialization paired with a novel diffusive repair model to refine sparse-data reconstructions.
  • The framework advances state-of-the-art sparse-view 3D reconstruction, simplifying data capture for applications in AR, VR, and gaming.

GaussianObject: Achieving High-Quality 3D Reconstruction with Minimal Views

Introduction to GaussianObject Framework

3D reconstruction from sparse views presents significant challenges due to limited 3D information and difficulties in achieving multi-view consistency. The newly introduced GaussianObject framework innovatively addresses these challenges. The framework utilizes Gaussian splatting to reconstruct and render 3D objects from merely four input images. This approach, grounded in visual hull techniques for Gaussian initialization and the employment of a Gaussian repair model, significantly advances the state of sparse-view 3D object reconstruction.

Advancements in Sparse-View 3D Reconstruction

Foundational Techniques

GaussianObject leverages 3D Gaussian Splatting (3DGS) for its base representation, which excels in fast and explicit scene depiction. To counter the sparse data challenge, the framework embeds structure priors through visual hull and floater elimination techniques, enhancing the preliminary 3D representation. This initialization is crucial for dealing with the inherent information scarcity in sparse viewpoints.

Gaussian Repair Model

A novel aspect of GaussianObject is its Gaussian repair model, developed to refine the coarse 3D reconstruction by addressing omitted or distorted object details. The approach involves a diffusive repair mechanism, utilizing large diffusion models adapted for 2D to 3D context translations. The employment of self-generating strategies for training the model with adequate image pairs is particularly notable, underscoring the framework's innovative handling of sparse data.

Theoretical and Practical Implications

Relevance to Current Research

GaussianObject aligns with and contributes to ongoing research in differentiable point-based rendering and neural rendering for sparse view reconstruction. By offering a robust framework capable of dealing with extremely sparse setups, this work addresses a significant gap — achieving high-quality 3D reconstructions with minimal image inputs. The comparison with related methods such as DVGO, 3DGS, and various NeRF adaptations, positions GaussianObject as a leading approach in terms of rendering quality and efficiency.

Implications for 3D Vision Applications

From a practical standpoint, GaussianObject's ability to operate with limited inputs vastly simplifies the process of capturing 3D datasets, making high-quality 3D reconstruction more accessible and efficient. This has far-reaching implications for fields relying on 3D content, including AR/VR, game development, and beyond, potentially lowering the barriers to entry for creators and enhancing the user experience with more immersive content.

Future Directions and Conclusion

While GaussianObject presents a significant step forward, areas such as reliance on precise camera parameters, handling of extreme views, and color accuracy in reconstructions are identified for further improvement. Future developments might explore optimizing camera parameters in tandem with 3D reconstructions or advanced anti-aliasing techniques for enhanced visual outputs.

In summary, GaussianObject introduces a compelling framework for sparse-view 3D object reconstruction, combining structure-prior optimization with an innovative repair model to produce high-fidelity 3D representations from a minimal set of images. The framework not only advances the technical capabilities in the field of 3D vision but also opens new avenues for practical applications, promising to make high-quality 3D reconstruction more accessible and applicable across a variety of domains.