3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos (2403.01444v4)
Abstract: Constructing photo-realistic Free-Viewpoint Videos (FVVs) of dynamic scenes from multi-view videos remains a challenging endeavor. Despite the remarkable advancements achieved by current neural rendering techniques, these methods generally require complete video sequences for offline training and are not capable of real-time rendering. To address these constraints, we introduce 3DGStream, a method designed for efficient FVV streaming of real-world dynamic scenes. Our method achieves fast on-the-fly per-frame reconstruction within 12 seconds and real-time rendering at 200 FPS. Specifically, we utilize 3D Gaussians (3DGs) to represent the scene. Instead of the na\"ive approach of directly optimizing 3DGs per-frame, we employ a compact Neural Transformation Cache (NTC) to model the translations and rotations of 3DGs, markedly reducing the training time and storage required for each FVV frame. Furthermore, we propose an adaptive 3DG addition strategy to handle emerging objects in dynamic scenes. Experiments demonstrate that 3DGStream achieves competitive performance in terms of rendering speed, image quality, training time, and model storage when compared with state-of-the-art methods.
- Hyperreel: High-fidelity 6-dof video with ray-conditioned sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16610–16620, 2023.
- Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
- Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5470–5479, 2022.
- Zip-nerf: Anti-aliased grid-based neural radiance fields. ICCV, 2023.
- The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields. Computer vision and image understanding, 63(1):75–104, 1996.
- sibr: A system for image based rendering, 2020.
- Immersive light field video with a layered mesh representation. ACM Transactions on Graphics (TOG), 39(4):86–1, 2020.
- Unstructured lumigraph rendering. In SIGGRAPH, pages 425–432, 2001.
- Hexplane: A fast representation for dynamic scenes. CVPR, 2023.
- Plenoptic sampling. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 307–318, 2000.
- Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14124–14133, 2021.
- Tensorf: Tensorial radiance fields. In European Conference on Computer Vision (ECCV), 2022.
- Dictionary fields: Learning a neural basis decomposition. ACM Trans. Graph., 2023a.
- Mobilenerf: Exploiting the polygon rasterization pipeline for efficient neural field rendering on mobile architectures. In The Conference on Computer Vision and Pattern Recognition (CVPR), 2023b.
- High-quality streamable free-viewpoint video. ACM Transactions on Graphics (TOG), 34(4):69, 2015.
- Unstructured light fields. Comput. Graph. Forum, 31(2pt1):305–314, 2012.
- Motion2fusion: Real-time volumetric performance capture. ACM Trans. Graph., 36(6):246:1–246:16, 2017.
- 4d gaussian splatting: Towards efficient novel view synthesis for dynamic scenes. arXiv preprint arXiv:2402.03307, 2024.
- Fast dynamic radiance fields with time-aware neural voxels. In SIGGRAPH Asia 2022 Conference Papers, 2022.
- Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022.
- Fastnerf: High-fidelity neural rendering at 200fps. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14346–14355, 2021.
- The lumigraph. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, page 43–54, New York, NY, USA, 1996. Association for Computing Machinery.
- Baking neural radiance fields for real-time view synthesis. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 5855–5864, 2021.
- Determining optical flow. Artificial intelligence, 17(1-3):185–203, 1981.
- Tri-miprf: Tri-mip representation for efficient anti-aliasing neural radiance fields. In ICCV, 2023.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), 2023.
- Nersemble: Multi-view radiance field reconstruction of human heads. arXiv preprint arXiv:2305.03027, 2023.
- Light field rendering. In SIGGRAPH, pages 31–42, 1996.
- Streaming radiance fields for 3d video synthesis. In NeurIPS, 2022a.
- Tava: Template-free animatable volumetric actors. In European Conference on Computer Vision, pages 419–436. Springer, 2022b.
- Neural 3d video synthesis from multi-view video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5521–5531, 2022c.
- Neural scene flow fields for space-time view synthesis of dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6498–6508, 2021.
- Dynibar: Neural dynamic image-based rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis. In 3DV, 2024.
- NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections. In CVPR, 2021.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In European conference on computer vision, pages 405–421. Springer, 2020.
- Nerf in the dark: High dynamic range view synthesis from noisy raw images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16190–16199, 2022.
- Thomas Müller. tiny-cuda-nn, 2021.
- Real-time neural radiance caching for path tracing. ACM Transactions on Graphics (TOG), 40(4):1–16, 2021.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, 2022.
- Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5480–5490, 2022.
- Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 5865–5874, 2021a.
- Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5865–5874, 2021b.
- Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. ACM Trans. Graph., 40(6), 2021c.
- Temporal interpolation is all you need for dynamic neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4212–4221, 2023.
- D-nerf: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10318–10327, 2021.
- Merf: Memory-efficient radiance fields for real-time view synthesis in unbounded scenes. ACM Transactions on Graphics (TOG), 42(4):1–12, 2023.
- K-planes: Explicit radiance fields in space, time, and appearance. In CVPR, 2023.
- Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Rendering with concentric mosaics. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, pages 299–306, 1999.
- Nerfplayer: A streamable dynamic scene representation with decomposed neural radiance fields. IEEE Transactions on Visualization and Computer Graphics, 29(5):2732–2742, 2023.
- Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5449–5459, 2022.
- Vgos: Voxel grid optimization for view synthesis from sparse inputs. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pages 1414–1422. International Joint Conferences on Artificial Intelligence Organization, 2023. Main Track.
- Shape and motion from image streams under orthography: a factorization method. International journal of computer vision, 9:137–154, 1992.
- Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 12959–12970, 2021.
- Ref-NeRF: Structured view-dependent appearance for neural radiance fields. CVPR, 2022.
- Mixed neural voxels for fast multi-view video synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19706–19716, 2023a.
- Fourier plenoctrees for dynamic radiance field rendering in real-time. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13524–13534, 2022.
- Neural residual radiance fields for streamably free-viewpoint videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 76–87, 2023b.
- Neural residual radiance fields for streamably free-viewpoint videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 76–87, 2023c.
- Tracking everything everywhere all at once. In International Conference on Computer Vision, 2023d.
- Humannerf: Free-viewpoint rendering of moving people from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16210–16220, 2022.
- Behind the scenes: Density fields for single view reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9076–9086, 2023.
- Nex: Real-time view synthesis with neural basis expansion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8534–8543, 2021.
- 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528, 2023.
- Diffusionerf: Regularizing neural radiance fields with denoising diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4180–4189, 2023.
- Space-time neural irradiance fields for free-viewpoint video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9421–9431, 2021.
- Banmo: Building animatable 3d neural models from many casual videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2863–2873, 2022.
- Freenerf: Improving few-shot neural rendering with free frequency regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8254–8263, 2023a.
- Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101, 2023b.
- Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. 2024.
- Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752–5761, 2021a.
- pixelNeRF: Neural radiance fields from one or few images. In CVPR, 2021b.
- Humannerf: Efficiently generated human radiance field from sparse inputs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7743–7753, 2022.
- High-quality video view interpolation using a layered representation. ACM transactions on graphics (TOG), 23(3):600–608, 2004.
- Ewa volume splatting. In Proceedings Visualization, 2001. VIS’01., pages 29–538. IEEE, 2001.