Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos (2403.01444v4)

Published 3 Mar 2024 in cs.CV

Abstract: Constructing photo-realistic Free-Viewpoint Videos (FVVs) of dynamic scenes from multi-view videos remains a challenging endeavor. Despite the remarkable advancements achieved by current neural rendering techniques, these methods generally require complete video sequences for offline training and are not capable of real-time rendering. To address these constraints, we introduce 3DGStream, a method designed for efficient FVV streaming of real-world dynamic scenes. Our method achieves fast on-the-fly per-frame reconstruction within 12 seconds and real-time rendering at 200 FPS. Specifically, we utilize 3D Gaussians (3DGs) to represent the scene. Instead of the na\"ive approach of directly optimizing 3DGs per-frame, we employ a compact Neural Transformation Cache (NTC) to model the translations and rotations of 3DGs, markedly reducing the training time and storage required for each FVV frame. Furthermore, we propose an adaptive 3DG addition strategy to handle emerging objects in dynamic scenes. Experiments demonstrate that 3DGStream achieves competitive performance in terms of rendering speed, image quality, training time, and model storage when compared with state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. Hyperreel: High-fidelity 6-dof video with ray-conditioned sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16610–16620, 2023.
  2. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
  3. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470–5479, 2022.
  4. Zip-nerf: Anti-aliased grid-based neural radiance fields. ICCV, 2023.
  5. The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Computer vision and image understanding, 63(1):75–104, 1996.
  6. sibr: A system for image based rendering, 2020.
  7. Immersive light field video with a layered mesh representation. ACM Transactions on Graphics (TOG), 39(4):86–1, 2020.
  8. Unstructured lumigraph rendering. In SIGGRAPH, pages 425–432, 2001.
  9. Hexplane: A fast representation for dynamic scenes. CVPR, 2023.
  10. Plenoptic sampling. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 307–318, 2000.
  11. Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14124–14133, 2021.
  12. Tensorf: Tensorial radiance fields. In European Conference on Computer Vision (ECCV), 2022.
  13. Dictionary fields: Learning a neural basis decomposition. ACM Trans. Graph., 2023a.
  14. Mobilenerf: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. In The Conference on Computer Vision and Pattern Recognition (CVPR), 2023b.
  15. High-quality streamable free-viewpoint video. ACM Transactions on Graphics (TOG), 34(4):69, 2015.
  16. Unstructured light fields. Comput. Graph. Forum, 31(2pt1):305–314, 2012.
  17. Motion2fusion: Real-time volumetric performance capture. ACM Trans. Graph., 36(6):246:1–246:16, 2017.
  18. 4d gaussian splatting: Towards efficient novel view synthesis for dynamic scenes. arXiv preprint arXiv:2402.03307, 2024.
  19. Fast dynamic radiance fields with time-aware neural voxels. In SIGGRAPH Asia 2022 Conference Papers, 2022.
  20. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022.
  21. Fastnerf: High-fidelity neural rendering at 200fps. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14346–14355, 2021.
  22. The lumigraph. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, page 43–54, New York, NY, USA, 1996. Association for Computing Machinery.
  23. Baking neural radiance fields for real-time view synthesis. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 5855–5864, 2021.
  24. Determining optical flow. Artificial intelligence, 17(1-3):185–203, 1981.
  25. Tri-miprf: Tri-mip representation for efficient anti-aliasing neural radiance fields. In ICCV, 2023.
  26. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), 2023.
  27. Nersemble: Multi-view radiance field reconstruction of human heads. arXiv preprint arXiv:2305.03027, 2023.
  28. Light field rendering. In SIGGRAPH, pages 31–42, 1996.
  29. Streaming radiance fields for 3d video synthesis. In NeurIPS, 2022a.
  30. Tava: Template-free animatable volumetric actors. In European Conference on Computer Vision, pages 419–436. Springer, 2022b.
  31. Neural 3d video synthesis from multi-view video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5521–5531, 2022c.
  32. Neural scene flow fields for space-time view synthesis of dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6498–6508, 2021.
  33. Dynibar: Neural dynamic image-based rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  34. Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In 3DV, 2024.
  35. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections. In CVPR, 2021.
  36. Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision, pages 405–421. Springer, 2020.
  37. Nerf in the dark: High dynamic range view synthesis from noisy raw images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16190–16199, 2022.
  38. Thomas Müller. tiny-cuda-nn, 2021.
  39. Real-time neural radiance caching for path tracing. ACM Transactions on Graphics (TOG), 40(4):1–16, 2021.
  40. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, 2022.
  41. Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5480–5490, 2022.
  42. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5865–5874, 2021a.
  43. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5865–5874, 2021b.
  44. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. ACM Trans. Graph., 40(6), 2021c.
  45. Temporal interpolation is all you need for dynamic neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4212–4221, 2023.
  46. D-nerf: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10318–10327, 2021.
  47. Merf: Memory-efficient radiance fields for real-time view synthesis in unbounded scenes. ACM Transactions on Graphics (TOG), 42(4):1–12, 2023.
  48. K-planes: Explicit radiance fields in space, time, and appearance. In CVPR, 2023.
  49. Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  50. Rendering with concentric mosaics. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, pages 299–306, 1999.
  51. Nerfplayer: A streamable dynamic scene representation with decomposed neural radiance fields. IEEE Transactions on Visualization and Computer Graphics, 29(5):2732–2742, 2023.
  52. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5449–5459, 2022.
  53. Vgos: Voxel grid optimization for view synthesis from sparse inputs. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pages 1414–1422. International Joint Conferences on Artificial Intelligence Organization, 2023. Main Track.
  54. Shape and motion from image streams under orthography: a factorization method. International journal of computer vision, 9:137–154, 1992.
  55. Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 12959–12970, 2021.
  56. Ref-NeRF: Structured view-dependent appearance for neural radiance fields. CVPR, 2022.
  57. Mixed neural voxels for fast multi-view video synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19706–19716, 2023a.
  58. Fourier plenoctrees for dynamic radiance field rendering in real-time. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13524–13534, 2022.
  59. Neural residual radiance fields for streamably free-viewpoint videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 76–87, 2023b.
  60. Neural residual radiance fields for streamably free-viewpoint videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 76–87, 2023c.
  61. Tracking everything everywhere all at once. In International Conference on Computer Vision, 2023d.
  62. Humannerf: Free-viewpoint rendering of moving people from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16210–16220, 2022.
  63. Behind the scenes: Density fields for single view reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9076–9086, 2023.
  64. Nex: Real-time view synthesis with neural basis expansion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8534–8543, 2021.
  65. 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528, 2023.
  66. Diffusionerf: Regularizing neural radiance fields with denoising diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4180–4189, 2023.
  67. Space-time neural irradiance fields for free-viewpoint video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9421–9431, 2021.
  68. Banmo: Building animatable 3d neural models from many casual videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2863–2873, 2022.
  69. Freenerf: Improving few-shot neural rendering with free frequency regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8254–8263, 2023a.
  70. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101, 2023b.
  71. Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. 2024.
  72. Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752–5761, 2021a.
  73. pixelNeRF: Neural radiance fields from one or few images. In CVPR, 2021b.
  74. Humannerf: Efficiently generated human radiance field from sparse inputs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7743–7753, 2022.
  75. High-quality video view interpolation using a layered representation. ACM transactions on graphics (TOG), 23(3):600–608, 2004.
  76. Ewa volume splatting. In Proceedings Visualization, 2001. VIS’01., pages 29–538. IEEE, 2001.
Citations (34)

Summary

  • The paper introduces a two-stage method that trains 3D Gaussians on-the-fly to efficiently stream free-viewpoint videos in dynamic scenes.
  • The paper achieves rapid rendering at 200 FPS and completes per-frame reconstruction in 12 seconds while reducing storage overhead.
  • The paper’s approach adapts transformations and adds new 3D Gaussians to manage scene changes in real time, enhancing visual fidelity.

Efficient Free-Viewpoint Video Streaming with 3DGStream

Introduction

The advent of photo-realistic Free-Viewpoint Videos (FVVs) has significantly impacted computer vision and graphics, particularly within VR/AR/XR applications. Despite various traditional and neural rendering methods developed to construct FVVs, real-world dynamic scenes pose unique challenges due to their complex geometries and the need for real-time rendering capabilities. This paper introduces 3DGStream, a novel approach utilizing 3D Gaussians for efficient streaming of FVVs, which not only facilitates on-the-fly per-frame reconstruction but also achieves unparalleled rendering speeds.

Background and Related Work

Existing methodologies largely struggle with the heavy computational and time requirements for rendering FVVs, with most solutions requiring complete video sequences for offline training, thus hindering their real-time application. However, recent works have shown promise in employing 3D Gaussians for static scenes, achieving rapid training and high-quality rendering. Building upon this foundation, 3DGStream addresses the challenge of dynamic scenes by initially training 3D Gaussians and iteratively applying transformations and additions to accommodate scene changes, thus enabling efficient and real-time FVV streaming without comprehensive offline training.

Methodology

3DGStream operates through a two-stage process to accommodate dynamic scene alterations efficiently. The first stage involves training a Neural Transformation Cache (NTC) to model the translations and rotations of 3DGs, effectively capturing object movements with minimal storage requirements. The second stage introduces an adaptive strategy for adding 3DGs to the scene, specifically targeting the appearance of new objects. This approach not only reduces the complexity associated with dynamic object incorporation but also significantly optimizes storage and training time by eliminating the need to train from scratch for each frame. By rendering with both transformed and newly added 3DGs, 3DGStream maintains high fidelity in scene representation across frames.

Experiments and Results

Extensive experiments demonstrate 3DGStream's effectiveness over current state-of-the-art methods in rendering speed, image quality, and efficiency. With capabilities to render at 200 FPS and complete per-frame reconstruction within 12 seconds, this method introduces significant improvements in the streaming of FVVs. Additionally, compared to existing techniques, 3DGStream requires substantially less model storage, showcasing its practicality for real-world applications.

Implications and Future Directions

3DGStream's innovative framework for efficient FVV streaming promises wide-ranging implications for virtual reality, telepresence, and interactive media. By mitigating the limitations of offline training and rendering speeds, this method stands at the forefront of advancing real-time, high-quality 3D content creation and streaming. Looking forward, the methodology paves the way for further research in dynamic scene representation and real-time streaming, potentially extending its application beyond FVV to other areas of computer vision and graphics requiring efficient, high-fidelity 3D model reconstruction and rendering.

Conclusion

In summary, 3DGStream advances the construction and streaming of Free-Viewpoint Videos through a novel, efficient approach leveraging 3D Gaussians. Its ability to handle dynamic scene alterations on-the-fly without compromising rendering quality or efficiency marks a significant step forward in real-time neural rendering. As this field continues to evolve, 3DGStream's contributions offer a solid foundation for future explorations into more sophisticated and scalable solutions for photo-realistic FVV streaming and beyond.