MotionMaster: Training-free Camera Motion Transfer For Video Generation (2404.15789v2)

Published 24 Apr 2024 in cs.CV

Abstract: The emergence of diffusion models has greatly propelled the progress in image and video generation. Recently, some efforts have been made in controllable video generation, including text-to-video generation and video motion control, among which camera motion control is an important topic. However, existing camera motion control methods rely on training a temporal camera module, and necessitate substantial computation resources due to the large amount of parameters in video generation models. Moreover, existing methods pre-define camera motion types during training, which limits their flexibility in camera control. Therefore, to reduce training costs and achieve flexible camera control, we propose COMD, a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos and transfers the extracted camera motions to new videos. We first propose a one-shot camera motion disentanglement method to extract camera motion from a single source video, which separates the moving objects from the background and estimates the camera motion in the moving objects region based on the motion in the background by solving a Poisson equation. Furthermore, we propose a few-shot camera motion disentanglement method to extract the common camera motion from multiple videos with similar camera motions, which employs a window-based clustering technique to extract the common features in temporal attention maps of multiple videos. Finally, we propose a motion combination method to combine different types of camera motions together, enabling our model a more controllable and flexible camera control. Extensive experiments demonstrate that our training-free approach can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks, achieving flexible and diverse camera motion control.

PDF HTML Abstract

MotionMaster: A Training-free Approach for Flexible Camera Motion Transfer in Video Generation

Introduction

MotionMaster introduces a novel, training-free methodology for transferring camera motion from a source video to a newly generated video without the need for retraining models. The paper establishes a mechanism to disentangle camera and object motions, allowing for precise camera motion control. This approach addresses the limitations of prior methods that involved significant computational resources and lacked flexibility in handling complex camera motions like those seen in professional film production.

Methodology Overview

The disentanglement process in MotionMaster leverages a one-shot and a few-shot method for camera motion extraction:

One-shot Camera Motion Disentanglement:
- Utilizes a single source video to separate camera and object motion.
- Estimates camera motion from background areas of a video and infers it for the regions with object movements using a Poisson equation.
Few-shot Camera Motion Disentanglement:
- Extracts common camera motion from multiple videos that share similar camera motions.
- Employs a window-based clustering method to effectively isolate camera motion from object motion by analyzing temporal attention maps across several videos.
Camera Motion Combination:
- The disentangled camera motions allow for combination and regional application, significantly enhancing the flexibility of video generation with respect to camera control.

Experimentation and Results

Extensive experiments demonstrate MotionMaster's ability to apply extracted camera motions effectively across various scenarios. Notably, the model supports advanced camera maneuvers like Dolly Zoom and variable-speed zooming, which are directly transferable to new video content.

Comparative Analysis:
- MotionMaster outperforms existing methods like AnimateDiff and MotionCtrl in terms of video quality and fidelity to the camera motion patterns of the source material.
- It achieves superior results in handling complex camera motion scenarios with significantly reduced computational overheads due to its training-free nature.
Quantitative Metrics:
- The model's performance is highlighted through standard video generation metrics like FID-V and FVD, indicating high-quality video output and accurate camera motion replication.

Future Research Directions

The implications for further research include exploring more granular disentanglement techniques that could allow for even finer control over the interaction between object and camera motions. Additionally, integrating this approach with more extensive video generation frameworks could pave the way for real-time video production tools in virtual reality and interactive media.

Conclusion

MotionMaster sets a new benchmark for flexible, efficient camera motion control in video generation. By eliminating the need for retraining and effectively decoupling camera and object motions, it offers a scalable solution adaptable to various professional video production needs. This approach could significantly impact how camera motions are managed in automatic video generation, leading to more creative and dynamic visual content.

PDF Markdown Bookmark Chat (Pro)

References (49)

Authors (8)

Teng Hu (26 papers)
Jiangning Zhang (102 papers)
Ran Yi (68 papers)
Yating Wang (39 papers)
Hongrui Huang (3 papers)
Jieyu Weng (2 papers)
Yabiao Wang (93 papers)
Lizhuang Ma (145 papers)

Citations (8)

View on Semantic Scholar

Tweets

https://twitter.com/_akhaliq/status/1783546554457973020