Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DreamGaussian4D: Generative 4D Gaussian Splatting (2312.17142v3)

Published 28 Dec 2023 in cs.CV and cs.GR

Abstract: 4D content generation has achieved remarkable progress recently. However, existing methods suffer from long optimization times, a lack of motion controllability, and a low quality of details. In this paper, we introduce DreamGaussian4D (DG4D), an efficient 4D generation framework that builds on Gaussian Splatting (GS). Our key insight is that combining explicit modeling of spatial transformations with static GS makes an efficient and powerful representation for 4D generation. Moreover, video generation methods have the potential to offer valuable spatial-temporal priors, enhancing the high-quality 4D generation. Specifically, we propose an integral framework with two major modules: 1) Image-to-4D GS - we initially generate static GS with DreamGaussianHD, followed by HexPlane-based dynamic generation with Gaussian deformation; and 2) Video-to-Video Texture Refinement - we refine the generated UV-space texture maps and meanwhile enhance their temporal consistency by utilizing a pre-trained image-to-video diffusion model. Notably, DG4D reduces the optimization time from several hours to just a few minutes, allows the generated 3D motion to be visually controlled, and produces animated meshes that can be realistically rendered in 3D engines.

Introduction to 4D Content Generation

The generation of digital content has advanced tremendously, with 2D images, 3D scenes, and even dynamic 4D (3D plus time) models now being created by various generative models. Historically, methods for creating 4D content have been plagued with long processing times and limited control over motion. A new approach, termed DreamGaussian4D, introduces an efficient framework for quickly generating dynamic 4D scenes using a technique called 4D Gaussian Splatting, which reduces the time for optimization from hours to minutes and also allows for more controllable and detailed animated content.

DreamGaussian4D Framework

In the DreamGaussian4D framework, the process of 4D content generation is broken down into three stages:

Static Generation

The first stage leverages improved practices called DreamGaussianHD to create a static 3D Gaussian Splatting (GS) model from an input image. By using multi-view optimization and setting a fixed background color, the quality of unseen areas in the 3D model is significantly enhanced.

Dynamic Generation

The second stage involves generating a driving video from the input image using an image-to-video diffusion model. This driving video then guides the optimization of a time-dependent deformation field that acts on the static 3D GS model. The innovation here is the use of an explicit video representation to drive motion, rather than just relying on still images, which yields better motion control and diversity.

Texture Refinement

In the final stage, the 4D GS is converted into an animated mesh sequence. Texture maps for each frame are then refined using a video-to-video pipeline to ensure temporal coherence, preventing issues like flickering between frames. This refinement stage enhances the visual quality of the animated meshes and also facilities their use in real-world applications.

Performance and Contributions

DreamGaussian4D substantially speeds up the generation process, creating 4D content within minutes as opposed to the hours required by previous methods. Additionally, it allows for more flexible manipulation of the generated motion and produces detailed and refined meshes that can be rendered efficiently. It also adopts deformable Gaussian Splatting for its speed and quality benefits in dynamic representations.

The paper's contributions include the employment of deformable Gaussian Splatting for representation in 4D content generation, a framework designed for image-to-4D that enhances control and diversity of motion, and a strategy for refining video textures to improve quality and facilitate deployment in practical settings.

Conclusion

By presenting DreamGaussian4D, a significant step forward has been made in the field of 4D content generation. This method not only provides significant improvements in speed and detail but also opens up new possibilities for controlling and animating digital models in three dimensions over time, presenting exciting opportunities for applications in animation, gaming, and virtual reality.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Particlenerf: A particle-based encoding for online neural radiance fields. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.  5975–5984, 2024.
  2. 4d-fy: Text-to-4d generation using hybrid score distillation sampling. arXiv preprint arXiv:2311.17984, 2023.
  3. Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127, 2023.
  4. Hexplane: A fast representation for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  130–141, 2023.
  5. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
  6. Fantasia3d: Disentangling geometry and appearance for high-quality text-to-3d content creation. arXiv preprint arXiv:2303.13873, 2023.
  7. Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  5939–5948, 2019.
  8. Bsp-net: Generating compact meshes via binary space partitioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  45–54, 2020.
  9. Objaverse-xl: A universe of 10m+ 3d objects. arXiv preprint arXiv:2307.05663, 2023a.
  10. Objaverse: A universe of annotated 3d objects. In CVPR, pp.  13142–13153, 2023b.
  11. Neural radiance flow for 4d view synthesis and video processing. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp.  14304–14314. IEEE Computer Society, 2021.
  12. Topologically-aware deformation fields for single-view 3d reconstruction. In CVPR, pp.  1536–1546, 2022.
  13. Fast dynamic radiance fields with time-aware neural voxels. In SIGGRAPH Asia 2022 Conference Papers, pp.  1–9, 2022.
  14. K-planes: Explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  12479–12488, 2023.
  15. Dynamic view synthesis from dynamic monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  5712–5721, 2021.
  16. Neurofluid: Fluid dynamics grounding with particle-driven neural radiance fields. In International Conference on Machine Learning, pp.  7919–7929. PMLR, 2022.
  17. Denoising diffusion probabilistic models. NeurIPS, 33:6840–6851, 2020.
  18. Lrm: Large reconstruction model for single image to 3d. arXiv preprint arXiv:2311.04400, 2023.
  19. Consistent4d: Consistent 360 {{\{{\\\backslash\deg}}\}} dynamic object generation from monocular video. arXiv preprint arXiv:2311.02848, 2023.
  20. Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463, 2023.
  21. 3d gaussian splatting for real-time radiance field rendering. ToG, 42(4):1–14, 2023.
  22. Neural 3d video synthesis from multi-view video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  5521–5531, 2022.
  23. Neural scene flow fields for space-time view synthesis of dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  6498–6508, 2021.
  24. Dynibar: Neural dynamic image-based rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  4273–4284, 2023.
  25. Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models. arXiv preprint arXiv:2312.13763, 2023.
  26. One-2-3-45++: Fast single image to 3d objects with consistent multi-view generation and 3d diffusion. arXiv preprint arXiv:2311.07885, 2023a.
  27. One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization. arXiv preprint arXiv:2306.16928, 2023b.
  28. Zero-1-to-3: Zero-shot one image to 3d object. arXiv preprint arXiv:2303.11328, 2023c.
  29. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. arXiv preprint arXiv:2308.09713, 2023.
  30. Realfusion: 360deg reconstruction of any object from a single image. In CVPR, pp.  8446–8455, 2023.
  31. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  32. Point-e: A system for generating 3d point clouds from complex prompts. arXiv preprint arXiv:2212.08751, 2022.
  33. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  5865–5874, 2021a.
  34. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. arXiv preprint arXiv:2106.13228, 2021b.
  35. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  36. D-nerf: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10318–10327, 2021.
  37. Magic123: One image to high-quality 3d object generation using both 2d and 3d diffusion priors. arXiv preprint arXiv:2306.17843, 2023.
  38. High-resolution image synthesis with latent diffusion models. In CVPR, pp.  10684–10695, 2022.
  39. Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  16632–16642, 2023.
  40. Knn-diffusion: Image generation via large-scale retrieval. arXiv preprint arXiv:2204.02849, 2022.
  41. Zero123++: a single image to consistent multi-view diffusion base model, 2023.
  42. Text-to-4d dynamic scene generation. arXiv preprint arXiv:2301.11280, 2023.
  43. Splatter image: Ultra-fast single-view 3d reconstruction. arXiv preprint arXiv:2312.13150, 2023.
  44. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv preprint arXiv:2309.16653, 2023a.
  45. Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior. arXiv preprint arXiv:2303.14184, 2023b.
  46. Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  12959–12970, 2021.
  47. Grf: Learning a general radiance field for 3d representation and rendering. In ICCV, pp.  15182–15192, 2021.
  48. Suds: Scalable urban dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  12375–12385, 2023.
  49. Lavie: High-quality video generation with cascaded latent diffusion models. arXiv preprint arXiv:2309.15103, 2023.
  50. 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528, 2023.
  51. Space-time neural irradiance fields for free-viewpoint video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  9421–9431, 2021.
  52. Disn: Deep implicit surface network for high-quality single-view 3d reconstruction. Advances in neural information processing systems, 32, 2019.
  53. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101, 2023.
  54. Mvimgnet: A large-scale dataset of multi-view images. In CVPR, 2023.
  55. Star: Self-supervised tracking and reconstruction of rigid objects in motion with neural rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  13144–13152, 2021.
  56. Animate124: Animating one image to 4d dynamic scene. arXiv preprint arXiv:2311.14603, 2023.
  57. A unified approach for text-and image-guided 4d scene generation. arXiv preprint arXiv:2311.16854, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jiawei Ren (33 papers)
  2. Liang Pan (93 papers)
  3. Jiaxiang Tang (23 papers)
  4. Chi Zhang (567 papers)
  5. Ang Cao (15 papers)
  6. Gang Zeng (40 papers)
  7. Ziwei Liu (368 papers)
Citations (72)
Github Logo Streamline Icon: https://streamlinehq.com