Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 60 tok/s

Gemini 2.5 Pro 51 tok/s Pro

GPT-5 Medium 39 tok/s Pro

GPT-5 High 40 tok/s Pro

GPT-4o 120 tok/s Pro

Kimi K2 211 tok/s Pro

GPT OSS 120B 416 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

Video Motion Transfer with Diffusion Transformers (2412.07776v2)

Published 10 Dec 2024 in cs.CV, cs.AI, and cs.LG

Abstract: We propose DiTFlow, a method for transferring the motion of a reference video to a newly synthesized one, designed specifically for Diffusion Transformers (DiT). We first process the reference video with a pre-trained DiT to analyze cross-frame attention maps and extract a patch-wise motion signal called the Attention Motion Flow (AMF). We guide the latent denoising process in an optimization-based, training-free, manner by optimizing latents with our AMF loss to generate videos reproducing the motion of the reference one. We also apply our optimization strategy to transformer positional embeddings, granting us a boost in zero-shot motion transfer capabilities. We evaluate DiTFlow against recently published methods, outperforming all across multiple metrics and human evaluation.