LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field (2404.08966v2)
Abstract: Cinemagraph is a unique form of visual media that combines elements of still photography and subtle motion to create a captivating experience. However, the majority of videos generated by recent works lack depth information and are confined to the constraints of 2D image space. In this paper, inspired by significant progress in the field of novel view synthesis (NVS) achieved by 3D Gaussian Splatting (3D-GS), we propose LoopGaussian to elevate cinemagraph from 2D image space to 3D space using 3D Gaussian modeling. To achieve this, we first employ the 3D-GS method to reconstruct 3D Gaussian point clouds from multi-view images of static scenes,incorporating shape regularization terms to prevent blurring or artifacts caused by object deformation. We then adopt an autoencoder tailored for 3D Gaussian to project it into feature space. To maintain the local continuity of the scene, we devise SuperGaussian for clustering based on the acquired features. By calculating the similarity between clusters and employing a two-stage estimation method, we derive an Eulerian motion field to describe velocities across the entire scene. The 3D Gaussian points then move within the estimated Eulerian motion field. Through bidirectional animation techniques, we ultimately generate a 3D Cinemagraph that exhibits natural and seamlessly loopable dynamics. Experiment results validate the effectiveness of our approach, demonstrating high-quality and visually appealing scene generation. The project is available at https://pokerlishao.github.io/LoopGaussian/.
- Panoramic video textures. In ACM SIGGRAPH 2005 Papers. 821–827.
- Automatic cinemagraph portraits. In Computer Graphics Forum, Vol. 32. Wiley Online Library, 17–25.
- Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5855–5864.
- Ang Cao and Justin Johnson. 2023. Hexplane: A fast representation for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 130–141.
- Segment any 3d gaussians. arXiv preprint arXiv:2312.00860 (2023).
- StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN. arXiv preprint arXiv:2403.14186 (2024).
- Animating pictures with stochastic motion textures. In ACM SIGGRAPH 2005 Papers. 853–860.
- Neural parametric gaussians for monocular non-rigid object reconstruction. arXiv preprint arXiv:2312.01196 (2023).
- 4D Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes. arXiv preprint arXiv:2402.03307 (2024).
- Elisabeth Flock. 2011. Cinemagraphs: What it looks like when a photo moves. The Washington Post (2011).
- K-planes: Explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12479–12488.
- Endless loops: detecting and animating periodic patterns in still images. ACM Transactions on graphics (TOG) 40, 4 (2021), 1–12.
- Baking neural radiance fields for real-time view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5875–5884.
- Animating pictures with eulerian motion fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5810–5819.
- SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes. arXiv preprint arXiv:2312.14937 (2023).
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401–4410.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics 42, 4 (2023), 1–14.
- GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding. arXiv preprint arXiv:2311.11863 (2023).
- 3d cinemagraphy from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4595–4605.
- Spacetime gaussian feature splatting for real-time dynamic view synthesis. arXiv preprint arXiv:2312.16812 (2023).
- Fast computation of seamless video loops. ACM Transactions on Graphics (TOG) 34, 6 (2015), 1–10.
- Automated video looping with progressive dynamism. ACM Transactions on Graphics (TOG) 32, 4 (2013), 1–10.
- Creating waterfall animation on a single image. Multimedia Tools and Applications 78 (2019), 6637–6653.
- Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle. arXiv:2312.03431 (2023).
- Toward better boundary preserved supervoxel segmentation for 3D point clouds. ISPRS journal of photogrammetry and remote sensing 143 (2018), 39–47.
- Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models. arXiv preprint arXiv:2312.13763 (2023).
- Neural sparse voxel fields. Advances in Neural Information Processing Systems 33 (2020), 15651–15663.
- Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis. In 3DV.
- 3D Video Loops from Asynchronous Input. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 310–320.
- Aniruddha Mahapatra and Kuldeep Kulkarni. 2022. Controllable animation of fluid elements in still images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3667–3676.
- Text-guided synthesis of eulerian cinemagraphs. ACM Transactions on Graphics (TOG) 42, 6 (2023), 1–13.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99–106.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM transactions on graphics (TOG) 41, 4 (2022), 1–15.
- Personalized cinemagraphs using semantic understanding and collaborative learning. In Proceedings of the IEEE International Conference on Computer Vision. 5160–5169.
- Margaret A Oliver and Richard Webster. 1990. Kriging: a method of interpolation for geographical information systems. International Journal of Geographical Information System 4, 3 (1990), 313–332.
- Voxel cloud connectivity segmentation-supervoxels for point clouds. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2027–2034.
- Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5865–5874.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019).
- D-nerf: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10318–10327.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 652–660.
- KiloNeRF: Speeding Up Neural Radiance Fields With Thousands of Tiny MLPs. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 14335–14345.
- Robustnerf: Ignoring distractors with robust losses. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20626–20636.
- Video textures. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2. 557–570.
- Layered depth images. In Proceedings of the 25th annual conference on Computer graphics and interactive techniques. 231–242.
- Towards moment imagery: Automatic cinemagraphs. In 2011 Conference for Visual Media Production. IEEE, 87–93.
- FVD: A new metric for video generation. (2019).
- Ref-nerf: Structured view-dependent appearance for neural radiance fields. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 5481–5490.
- Nerfbusters: Removing ghostly artifacts from casually captured nerfs. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 18120–18130.
- 4d gaussian splatting for real-time dynamic scene rendering. arXiv preprint arXiv:2310.08528 (2023).
- Physgaussian: Physics-integrated 3d gaussians for generative dynamics. arXiv preprint arXiv:2311.12198 (2023).
- Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. arXiv preprint arXiv:2309.13101 (2023).
- Mei-Chen Yeh and Po-Yi Li. 2012. An approach to automatic creation of cinemagraphs. In Proceedings of the 20th ACM international conference on Multimedia. 1153–1156.
- Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5752–5761.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.
- Stereo magnification: Learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817 (2018).