Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 73 tok/s

Gemini 2.5 Pro 51 tok/s Pro

GPT-5 Medium 31 tok/s Pro

GPT-5 High 32 tok/s Pro

GPT-4o 103 tok/s Pro

Kimi K2 218 tok/s Pro

GPT OSS 120B 460 tok/s Pro

Claude Sonnet 4.5 35 tok/s Pro

2000 character limit reached

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video (2503.11647v1)

Published 14 Mar 2025 in cs.CV

Abstract: Camera control has been actively studied in text or image conditioned video generation tasks. However, altering camera trajectories of a given video remains under-explored, despite its importance in the field of video creation. It is non-trivial due to the extra constraints of maintaining multiple-frame appearance and dynamic synchronization. To address this, we present ReCamMaster, a camera-controlled generative video re-rendering framework that reproduces the dynamic scene of an input video at novel camera trajectories. The core innovation lies in harnessing the generative capabilities of pre-trained text-to-video models through a simple yet powerful video conditioning mechanism -- its capability often overlooked in current research. To overcome the scarcity of qualified training data, we construct a comprehensive multi-camera synchronized video dataset using Unreal Engine 5, which is carefully curated to follow real-world filming characteristics, covering diverse scenes and camera movements. It helps the model generalize to in-the-wild videos. Lastly, we further improve the robustness to diverse inputs through a meticulously designed training strategy. Extensive experiments tell that our method substantially outperforms existing state-of-the-art approaches and strong baselines. Our method also finds promising applications in video stabilization, super-resolution, and outpainting. Project page: https://jianhongbai.github.io/ReCamMaster/

Summary

The paper introduces ReCamMaster, a framework for camera-controlled generative video re-rendering that allows arbitrary camera path modifications from a single input video.
It uses pre-trained text-to-video models and a novel video conditioning mechanism for improved spatio-temporal consistency across re-rendered frames.
Quantitative results demonstrate superior visual quality and camera control accuracy, outperforming state-of-the-art and enabling video stabilization and outpainting applications.

Overview

ReCamMaster introduces a framework for camera-controlled generative video re-rendering that enables the reproduction of dynamic scenes from a single input video while allowing arbitrary camera trajectory modifications. The method leverages pre-trained text-to-video (T2V) models and a novel video conditioning mechanism, proposing a structured approach to maintain spatio-temporal consistency across re-rendered frames under novel viewpoints.

Methodology

The framework exploits generative capabilities of off-the-shelf T2V models, employing frame-dimension token concatenation for video conditioning. This design choice facilitates finer interaction between conditional and target frames, improving synchronization and dynamic consistency compared to traditional conditioning strategies (e.g., channel or view-dimension techniques). A core contribution of the paper is the construction of a multi-camera synchronized video dataset using Unreal Engine 5, designed to emulate real-world filming characteristics. This dataset spans diverse scene settings and camera movements, which significantly aids in generalizing the model to in-the-wild videos.

A meticulously designed training strategy is presented to enhance robustness when handling heterogeneous input videos. The strategy accounts for maintaining appearance consistency across multiple frames and aligning camera control parameters with dynamic scene attributes.

Quantitative and Qualitative Results

ReCamMaster outperforms current state-of-the-art methods as indicated by several performance metrics, including FVD, FID, and CLIP Text/Frame consistency scores. The experimental results substantiate the following claims:

Superior Visual Quality: Generated frames maintain high fidelity to source details while accurately adapting to novel camera trajectories.
Camera Control Accuracy: Quantitative evaluations demonstrate a significant improvement over baseline methods in terms of alignment and dynamic synchronization.
Enhanced Spatio-temporal Consistency: The adopted video conditioning mechanism shows robust performance, yielding more coherent frame-to-frame transitions.

In experimental comparisons, the framework shows marked improvement in metrics such as FVD and FID, supporting the claim of a robust model capable of handling diverse video inputs with complex camera paths.

Applications

ReCamMaster opens up several applications in video post-processing and content creation:

Video Stabilization: By enabling controlled camera trajectories, the framework can effectively stabilize shaky video sequences.
Super-resolution: The method allows for the reconstruction of high-quality frames under novel viewpoints, making it particularly suitable for super-resolution tasks.
Outpainting: The approach demonstrates potential in expanding video content beyond the original frame boundaries while maintaining coherence in dynamic scenes.

Conclusion

ReCamMaster represents a significant advancement in the domain of generative video rendering with camera control, addressing the crucial challenge of trajectory manipulation in single-video input scenarios. By integrating a video conditioning mechanism with a comprehensive Unreal Engine 5-based dataset and a robust training strategy, the framework achieves enhanced visual quality, precise camera control accuracy, and effective spatio-temporal consistency. These improvements are quantitatively validated through superior performance in FVD, FID, and CLIP scores, making it a compelling approach for applications such as video stabilization, super-resolution, and outpainting.

In summary, ReCamMaster combines advanced conditioning techniques with a robust training regimen to achieve state-of-the-art outcomes in dynamic video re-rendering, providing notable utility for subsequent applications in video post-processing and content expansion.