Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models (2311.18837v1)

Published 30 Nov 2023 in cs.CV, cs.AI, cs.LG, and cs.MM

Abstract: Diffusion models have achieved significant success in image and video generation. This motivates a growing interest in video editing tasks, where videos are edited according to provided text descriptions. However, most existing approaches only focus on video editing for short clips and rely on time-consuming tuning or inference. We are the first to propose Video Instruction Diffusion (VIDiff), a unified foundation model designed for a wide range of video tasks. These tasks encompass both understanding tasks (such as language-guided video object segmentation) and generative tasks (video editing and enhancement). Our model can edit and translate the desired results within seconds based on user instructions. Moreover, we design an iterative auto-regressive method to ensure consistency in editing and enhancing long videos. We provide convincing generative results for diverse input videos and written instructions, both qualitatively and quantitatively. More examples can be found at our website https://ChenHsing.github.io/VIDiff.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zhen Xing (25 papers)
  2. Qi Dai (58 papers)
  3. Zihao Zhang (75 papers)
  4. Hui Zhang (405 papers)
  5. Han Hu (196 papers)
  6. Zuxuan Wu (144 papers)
  7. Yu-Gang Jiang (223 papers)
Citations (13)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub