Cinematic Behavior Transfer via NeRF-based Differentiable Filming (2311.17754v1)
Abstract: In the evolving landscape of digital media and video production, the precise manipulation and reproduction of visual elements like camera movements and character actions are highly desired. Existing SLAM methods face limitations in dynamic scenes and human pose estimation often focuses on 2D projections, neglecting 3D statuses. To address these issues, we first introduce a reverse filming behavior estimation technique. It optimizes camera trajectories by leveraging NeRF as a differentiable renderer and refining SMPL tracks. We then introduce a cinematic transfer pipeline that is able to transfer various shot types to a new 2D video or a 3D virtual environment. The incorporation of 3D engine workflow enables superior rendering and control abilities, which also achieves a higher rating in the user study.
- Lu-nerf: Scene and pose estimation by synchronizing local unposed nerfs. arXiv preprint arXiv:2306.05410, 2023.
- Garf: gaussian activated radiance fields for high fidelity reconstruction and pose estimation. arXiv e-prints, pages arXiv–2204, 2022.
- Beyond static features for temporally consistent 3d human pose and shape from a video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1964–1973, 2021.
- Simultaneous localization and mapping. IEEE robotics & automation magazine, 13(2):99–110, 2006.
- Bodyslam: joint camera localisation, mapping, and human motion tracking. In European Conference on Computer Vision, pages 656–673. Springer, 2022.
- Learning 3d human dynamics from video. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5614–5623, 2019.
- Pace: Human and motion estimation from in-the-wild videos. In 3DV, 2024.
- Melon: Nerf with unposed images using equivalence class estimation. arXiv preprint arXiv:2303.08096, 2023.
- D &d: Learning human dynamics from dynamic camera. In European Conference on Computer Vision, pages 479–496. Springer, 2022.
- Barf: Bundle-adjusting neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5741–5751, 2021.
- Parallel inversion of neural radiance fields for robust pose estimation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 9377–9384. IEEE, 2023.
- 4d human body capture from egocentric video via 3d scene grounding. In 2021 international conference on 3D vision (3DV), pages 930–939. IEEE, 2021.
- Smpl: A skinned multi-person linear model. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 851–866. 2023.
- Gnerf: Gan-based neural radiance field without posed camera. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6351–6361, 2021.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
- The one where they reconstructed 3d humans and environments in tv shows. In European Conference on Computer Vision, pages 732–749. Springer, 2022.
- D-nerf: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10318–10327, 2021.
- Tracking people by predicting 3D appearance, location & pose. In CVPR, 2022.
- A unified framework for shot type classification based on subject centric lens. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16, pages 17–34. Springer, 2020.
- Dynamic storyboard generation in an engine-based virtual environment for video production. In ACM SIGGRAPH 2023 Posters, pages 1–2. 2023.
- Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV), 2016.
- Flowcam: Training generalizable 3d radiance fields without camera poses via pixel-aligned scene flow. arXiv preprint arXiv:2306.00180, 2023.
- Raft: Recurrent all-pairs field transforms for optical flow. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 402–419. Springer, 2020.
- Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras. Advances in neural information processing systems, 34:16558–16569, 2021.
- Jaws: Just a wild shot for cinematic transfer in neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16933–16942, 2023.
- Lite pose: Efficient architecture design for 2d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13126–13136, 2022.
- ViTPose: Simple vision transformer baselines for human pose estimation. In Advances in Neural Information Processing Systems, 2022.
- Decoupling human and camera motion from videos in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21222–21232, 2023.
- inerf: Inverting neural radiance fields for pose estimation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1323–1330. IEEE, 2021.
- Virtual camera layout generation using a reference video. In Proceedings of the CHI Conference on Human Factors in Computing Systems, pages 1–11, 2021.
- Human dynamics from monocular video with dynamic camera movements. ACM Transactions on Graphics (TOG), 40(6):1–14, 2021.
- Glamr: Global occlusion-aware human mesh recovery with dynamic cameras. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11038–11049, 2022.
- ProPainter: Improving propagation and transformer for video inpainting. In Proceedings of IEEE International Conference on Computer Vision (ICCV), 2023.