Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis (2308.09713v1)

Published 18 Aug 2023 in cs.CV

Abstract: We present a method that simultaneously addresses the tasks of dynamic scene novel-view synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements. We follow an analysis-by-synthesis framework, inspired by recent work that models scenes as a collection of 3D Gaussians which are optimized to reconstruct input images via differentiable rendering. To model dynamic scenes, we allow Gaussians to move and rotate over time while enforcing that they have persistent color, opacity, and size. By regularizing Gaussians' motion and rotation with local-rigidity constraints, we show that our Dynamic 3D Gaussians correctly model the same area of physical space over time, including the rotation of that space. Dense 6-DOF tracking and dynamic reconstruction emerges naturally from persistent dynamic view synthesis, without requiring any correspondence or flow as input. We demonstrate a large number of downstream applications enabled by our representation, including first-person view synthesis, dynamic compositional scene synthesis, and 4D video editing.

Citations (314)

View on Semantic Scholar

Summary

The paper introduces a novel dynamic 3D Gaussian representation to achieve persistent dynamic view synthesis and accurate 6-DOF tracking.
It uses gradient-based optimization on Gaussian attributes to render dense scenes at 850 FPS with a PSNR of 28.7.
This approach benefits applications in AR, robotics, and video editing while setting a new benchmark for dynamic scene reconstruction.

Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis

The paper presents a sophisticated method for tackling the complex tasks of dynamic scene novel-view synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements. This approach is rooted in an analysis-by-synthesis framework, which offers a compelling solution to accurately render and track dynamic 3D scenes in a temporally persistent manner. This work builds upon the concept of representing complex scenes using Dynamic 3D Gaussians, whose motion, rotation, color, opacity, and size are optimized to reconstruct input images via differentiable rendering.

Methodology

The core methodology introduces Dynamic 3D Gaussians as a novel representation of dynamic scenes. The researchers leverage the inherent local-rigidity properties of Gaussians, allowing them to move and rotate over time while maintaining persistent attributes such as color, opacity, and size. This approach does not require any explicit correspondence or flow information, as tracking naturally emerges from the consistent modeling of dynamic scenes. By optimizing Gaussians’ attributes using gradient-based methods, the researchers achieve accurate dense scene reconstructions, facilitating multiple downstream applications such as first-person view synthesis, dynamic compositional scene synthesis, and 4D video editing.

Numerical Results

The effectiveness of this method is substantiated by strong numerical results. Using data from the CMU Panoptic Studio dataset, the research achieves a PSNR of 28.7 in dynamic novel view rendering, facilitated by a rendering speed of 850 FPS. The dense 3D scene tracking is characterized by a low average 3D error of 2.21 cm over 150 timesteps coupled with a 2D tracking normalized-pixel error of 1.57, which is a tenfold improvement over previous methods.

Implications and Future Directions

The implications of this research are twofold: practical and theoretical. Practically, it significantly enhances the capabilities in areas such as robotics, augmented reality, and cinematic content generation by providing rapid, accurate, and visually appealing dynamic scene reconstructions. Theoretically, it bridges the gap in dynamic scene understanding, paving the way for new insights into persistent modeling techniques. Moreover, this work advocates for potential advancements in AI through bidirectional tracking of dynamic scenes in both discriminative and generative domains.

Future research may explore extending this method's capabilities to handle scenes with newly entering or occluded objects, or applying the framework to monocular video settings. Additionally, further exploration of integrating this approach with existing large-scale synthetic datasets could provide a richer understanding of dynamic environments.

In conclusion, this paper exemplifies a promising step forward in the domain of dynamic scene modeling and tracking, offering a robust framework with broad implications across various technological fields and applications. As the method matures, it is likely to inspire further research and practical applications, setting a new standard in dynamic 3D scene tracking and synthesis.

PDF Markdown