• Large scale text to image generation models are being used for editing videos in concurrent work.
  • Some of these methods include Video-P2P, Fate-Zero, Tune-A-Video, and Gen-1.

Key terms:

  • Large scale text to image generation models: Models that generate images from text and are used for image editing tasks
  • Video-P2P: A method that extends null inversion to video clips and adapts a cross attention control mechanism
  • Fate-Zero: A training-free strategy for editing videos that utilizes cross attention maps to compute blending masks
  • Tune-A-Video: A method that finetunes the image generation model given an input video and uses cross-frame attention for consistent edits
  • Gen-1: A large scale video generation model that uses depth as a structural cue and is trained on a mixed dataset of images and videos


