Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting (2312.13308v2)

Published 20 Dec 2023 in cs.CV

Abstract: Novel view synthesis has shown rapid progress recently, with methods capable of producing increasingly photorealistic results. 3D Gaussian Splatting has emerged as a promising method, producing high-quality renderings of scenes and enabling interactive viewing at real-time frame rates. However, it is limited to static scenes. In this work, we extend 3D Gaussian Splatting to reconstruct dynamic scenes. We model a scene's dynamics using dynamic MLPs, learning deformations from temporally-local canonical representations to per-frame 3D Gaussians. To disentangle static and dynamic regions, tuneable parameters weigh each Gaussian's respective MLP parameters, improving the dynamics modelling of imbalanced scenes. We introduce a sliding window training strategy that partitions the sequence into smaller manageable windows to handle arbitrary length scenes while maintaining high rendering quality. We propose an adaptive sampling strategy to determine appropriate window size hyperparameters based on the scene's motion, balancing training overhead with visual quality. Training a separate dynamic 3D Gaussian model for each sliding window allows the canonical representation to change, enabling the reconstruction of scenes with significant geometric changes. Temporal consistency is enforced using a fine-tuning step with self-supervising consistency loss on randomly sampled novel views. As a result, our method produces high-quality renderings of general dynamic scenes with competitive quantitative performance, which can be viewed in real-time in our dynamic interactive viewer.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Marc Alexa. Linear combination of transformations. Proceedings of the 29th annual conference on Computer graphics and interactive techniques, 2002.
  2. HyperReel: High-fidelity 6-DoF video with ray-conditioned sampling. In CVPR, 2023.
  3. Neural Pixel Composition for 3D-4D View Synthesis from Multi-Views. In CVPR, 2023.
  4. 4D Visualization of Dynamic Events from Unconstrained Multi-View Videos. In CVPR, 2020.
  5. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In ICCV, pages 5835–5844, 2021.
  6. Robust Dense Mapping for Large-Scale Dynamic Environments. In ICRA, 2018.
  7. HexPlane: A Fast Representation for Dynamic Scenes. In CVPR, 2023.
  8. Tensorf: Tensorial radiance fields. In ECCV, 2022.
  9. Neural radiance flow for 4d view synthesis and video processing. In ICCV, 2021.
  10. Fast Dynamic Radiance Fields with Time-Aware Neural Voxels. In SIGGRAPH Asia 2022 Conference Papers, 2022.
  11. K-Planes: Explicit Radiance Fields in Space, Time, and Appearance. In CVPR, 2023.
  12. Baking Neural Radiance Fields for Real-Time View Synthesis. In ICCV, 2021.
  13. Humanrf: High-fidelity neural radiance fields for humans in motion. ACM TOG, 42(4):1–12, 2023.
  14. Map visibility estimation for large-scale dynamic 3d reconstruction. In CVPR, 2014.
  15. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM TOG, 42(4), 2023.
  16. D-NeRF: Neural Radiance Fields for Dynamic Scenes. In CVPR, 2020.
  17. Streaming Radiance Fields for 3D Video Synthesis. In NeurIPS, 2022a.
  18. TAVA: Template-free animatable volumetric actors. In ECCV, 2022b.
  19. Neural 3D Video Synthesis from Multi-view Video. In CVPR, 2022c.
  20. Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes. In CVPR, 2021.
  21. DynIBaR: Neural Dynamic Image-Based Rendering. In CVPR, 2023.
  22. Deep 3D Mask Volume for View Synthesis of Dynamic Scenes. In ICCV, 2021.
  23. Robust Dynamic Radiance Fields. In CVPR, 2023.
  24. Neural Volumes: Learning Dynamic Renderable Volumes from Images. ACM TOG, 38(4), 2019.
  25. Track to reconstruct and reconstruct to track. IEEE Robotics and Automation Letters, 5(2):1803–1810, 2020.
  26. Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis. In 3DV, 2024.
  27. Tunable convolutions with parametric multi-loss optimization. In CVPR, 2023.
  28. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV, 2020.
  29. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, 2022.
  30. Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time. In CVPR, 2015.
  31. Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. In CVPR, pages 5470–5480, 2021.
  32. Nerfies: Deformable Neural Radiance Fields. ICCV, 2021a.
  33. HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields. ACM TOG, 40(6), 2021b.
  34. Animatable neural radiance fields for modeling dynamic human bodies. In ICCV, 2021.
  35. Representing Volumetric Videos as Dynamic MLP Maps. In CVPR, 2023.
  36. KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs. In ICCV, 2021.
  37. Structure-from-motion revisited. In CVPR, pages 4104–4113, 2016.
  38. Tensor4D: Efficient Neural 4D Decomposition for High-fidelity Dynamic Reconstruction and Rendering. In CVPR, 2023.
  39. KillingFusion: Non-rigid 3D Reconstruction without Correspondences. In CVPR, 2017.
  40. NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed Neural Radiance Fields. IEEE Transactions on Visualization and Computer Graphics, 29(5):2732–2742, 2023.
  41. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. In ECCV, 2020.
  42. SUDS: Scalable Urban Dynamic Scenes. In CVPR, 2023.
  43. Mixed Neural Voxels for Fast Multi-view Video Synthesis. In ICCV, 2023.
  44. Fourier plenoctrees for dynamic radiance field rendering in real-time. In CVPR, 2022.
  45. IBRNet: Learning Multi-View Image-Based Rendering. In CVPR, 2021.
  46. HumanNeRF: Free-Viewpoint Rendering of Moving People From Monocular Video. In CVPR, 2022.
  47. Fast-vqa: Efficient end-to-end video quality assessment with fragment sampling. In ECCV, 2022.
  48. Space-time Neural Irradiance Fields for Free-Viewpoint Video. In CVPR, 2021.
  49. Temporal-MPI: Enabling Multi-Plane Images For Dynamic Scene Modelling Via Temporal Basis Learning. In ECCV, page 323–338, 2022.
  50. PlenOctrees for real-time rendering of neural radiance fields. In ICCV, 2021.
  51. Humannerf: Efficiently generated human radiance field from sparse inputs. In CVPR, pages 7743–7753, 2022.
  52. Stereo magnification: Learning view synthesis using multiplane images. ACM TOG, 37, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Richard Shaw (25 papers)
  2. Jifei Song (18 papers)
  3. Arthur Moreau (11 papers)
  4. Michal Nazarczuk (9 papers)
  5. Sibi Catley-Chandar (10 papers)
  6. Helisa Dhamo (14 papers)
  7. Eduardo Perez-Pellitero (4 papers)
Citations (6)

Summary

  • The paper introduces SWAGS, an adaptive sliding-window technique that extends 3D Gaussian splatting to dynamic scene reconstruction.
  • It employs per-window dynamic modeling with tuneable MLPs to effectively separate static and dynamic scene elements for enhanced rendering quality.
  • The method achieves real-time performance at over 71 FPS while ensuring temporal consistency through self-supervised fine-tuning.

Overview

The paper introduces a novel approach to novel view synthesis called SWAGS (Sampling Windows Adaptively for Dynamic 3D Gaussian Splatting). Unlike previous methodologies that are typically restricted to static scenes, the new method extends the capabilities of 3D Gaussian Splatting, a technique that has gained attention for high-quality renderings, to reconstruct dynamic scenes for real-time interactive viewing.

Motivation and Challenges

Prevailing synthetic view techniques operating on 3D Gaussian principles face challenges with dynamic scenes, such as changing topologies, significant geometric modifications, or extended sequences leading to blurred results. Contemporary works that offer dynamic reconstruction often suffer from either lack of temporal consistency or heavy computational loads, making them impractical for real-time applications.

Innovative Approach

The introduced method tackles these challenges with several key innovations:

  1. Adaptive Window Sampling: The dynamic scenes are divided into windows of varying lengths based on motion intensity, enabling the handling of arbitrary-length sequences while maintaining high render quality.
  2. Per-Window Dynamic 3D Gaussian Splatting: Each window uses its own set of dynamic 3D Gaussians modeled by tuneable MLP (Multilayer Perceptron) parameters, which focus on dynamic region reconstruction to disentangle static and dynamic scene parts.
  3. Temporal Consistency Enforcement: Through a fine-tuning process utilizing self-supervising consistency loss on novel views, the method enforces temporal consistency between windows, significantly reducing flickering and ensuring smooth scene transition.

Training and Implementation

For the model training, the authors utilized a PyTorch-based implementation of 3D Gaussian Splatting, beginning with point cloud data from COLMAP. They benefitted from training each window's model in parallel to expedite the process and fine-tuned each afterward to ensure temporal consistency. Their adaptation allowed for real-time frame-rate rendering with superior reconstructive quality of dynamic scenes, including those with significant motion like flames.

Results and Contributions

The authors conducted extensive comparative studies, demonstrating that their method surpasses the current state-of-the-art techniques in rendering quality as assessed by PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index). Furthermore, it exhibits real-time performance at 71.51 FPS (Frames Per Second), a significant improvement over competing methods.

Conclusion

SWAGS revolutionizes the field of novel view synthesis by delivering high-fidelity, real-time interactive rendering of dynamic scenes previously unachievable due to the limitations of static model dependant methods. Through its adaptive window sampling, dynamic 3D Gaussian splatting, and groundbreaking fine-tuning practices, it sets a new benchmark for future developments in the domain.