Framer: Interactive Frame Interpolation (2410.18978v2)
Abstract: We propose Framer for interactive frame interpolation, which targets producing smoothly transitioning frames between two images as per user creativity. Concretely, besides taking the start and end frames as inputs, our approach supports customizing the transition process by tailoring the trajectory of some selected keypoints. Such a design enjoys two clear benefits. First, incorporating human interaction mitigates the issue arising from numerous possibilities of transforming one image to another, and in turn enables finer control of local motions. Second, as the most basic form of interaction, keypoints help establish the correspondence across frames, enhancing the model to handle challenging cases (e.g., objects on the start and end frames are of different shapes and styles). It is noteworthy that our system also offers an "autopilot" mode, where we introduce a module to estimate the keypoints and refine the trajectory automatically, to simplify the usage in practice. Extensive experimental results demonstrate the appealing performance of Framer on various applications, such as image morphing, time-lapse video generation, cartoon interpolation, etc. The code, the model, and the interface will be released to facilitate further research.
- Alyaa Aloraibi. Image morphing techniques: A review. Technium: Romanian Journal of Applied Sciences and Technology, 2023.
- VD3D: taming large video diffusion transformers for 3d camera control. arXiv: Computing Research Repo., abs/2407.12781, 2024.
- Depth-aware video frame interpolation. In IEEE Conf. Comput. Vis. Pattern Recog., 2019.
- Memc-net: Motion estimation and motion compensation driven neural network for video interpolation and enhancement. IEEE Trans. Pattern Anal. Mach. Intell., 2021.
- Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv: Computing Research Repo., abs/2311.15127, 2023a.
- Align your latents: High-resolution video synthesis with latent diffusion models. In IEEE Conf. Comput. Vis. Pattern Recog., 2023b.
- Video generation models as world simulators. OpenAI technical reports, 2024.
- Videocrafter1: Open diffusion models for high-quality video generation. arXiv: Computing Research Repo., abs/2310.19512, 2023.
- Videocrafter2: Overcoming data limitations for high-quality video diffusion models. arXiv: Computing Research Repo., abs/2401.09047, 2024.
- Video frame interpolation via deformable separable convolution. In Assoc. Adv. Artif. Intell., 2020.
- Multiple video frame interpolation via enhanced deformable separable convolution. IEEE Trans. Pattern Anal. Mach. Intell., 2022.
- St-mfnet: A spatio-temporal multi-flow network for frame interpolation. In IEEE Conf. Comput. Vis. Pattern Recog., 2022.
- LDMVFI: video frame interpolation with latent diffusion models. In Assoc. Adv. Artif. Intell., 2024.
- CDFI: compression-driven network design for frame interpolation. In IEEE Conf. Comput. Vis. Pattern Recog., 2021.
- Video frame interpolation: A comprehensive survey. ACM Trans. Multim. Comput. Commun. Appl., 2023.
- Explorative inbetweening of time and space. arXiv: Computing Research Repo., abs/2403.14611, 2024.
- Preserve your own correlation: A noise prior for video diffusion models. In Int. Conf. Comput. Vis., 2023.
- Featureflow: Robust video interpolation via structure-to-texture generation. In IEEE Conf. Comput. Vis. Pattern Recog., 2020.
- Sparsectrl: Adding sparse controls to text-to-video diffusion models. arXiv: Computing Research Repo., abs/2311.16933, 2023.
- Cameractrl: Enabling camera control for text-to-video generation. arXiv: Computing Research Repo., abs/2404.02101, 2024.
- RIFE: real-time intermediate flow estimation for video frame interpolation. arXiv: Computing Research Repo., abs/2011.06294, 2020.
- Video interpolation with diffusion models. arXiv: Computing Research Repo., abs/2404.01203, 2024.
- Super slomo: High quality estimation of multiple intermediate frames for video interpolation. In IEEE Conf. Comput. Vis. Pattern Recog., 2018.
- Enhanced bi-directional motion estimation for video frame interpolation. In IEEE Winter Conf. Appl. Comput. Vis., 2023.
- FLAVR: flow-agnostic video representations for fast frame interpolation. In IEEE Winter Conf. Appl. Comput. Vis., 2023.
- Cotracker: It is better to track together. arXiv: Computing Research Repo., abs/2307.07635, 2023.
- Ifrnet: Intermediate feature refine network for efficient frame interpolation. In IEEE Conf. Comput. Vis. Pattern Recog., 2022.
- Adacof: Adaptive collaboration of flows for video frame interpolation. In IEEE Conf. Comput. Vis. Pattern Recog., 2020.
- H-VFI: hierarchical frame interpolation for videos with large motions. arXiv: Computing Research Repo., abs/2211.11309, 2022.
- AMT: all-pairs multi-field transforms for efficient frame interpolation. In IEEE Conf. Comput. Vis. Pattern Recog., 2023.
- Enhanced quadratic video interpolation. In Eur. Conf. Comput. Vis. Worksh., 2020.
- Decoupled weight decay regularization. In Int. Conf. Learn. Represent., 2019.
- David G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis., 2004.
- Video frame interpolation with transformer. In IEEE Conf. Comput. Vis. Pattern Recog., 2022.
- Revideo: Remake a video with motion and content control. arXiv: Computing Research Repo., abs/2405.13865, 2024a.
- T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. In Assoc. Adv. Artif. Intell., 2024b.
- Openvid-1m: A large-scale high-quality dataset for text-to-video generation. arXiv: Computing Research Repo., abs/2407.02371, 2024.
- Context-aware synthesis for video frame interpolation. In IEEE Conf. Comput. Vis. Pattern Recog., 2018.
- Softmax splatting for video frame interpolation. In IEEE Conf. Comput. Vis. Pattern Recog., 2020.
- Video frame interpolation via adaptive separable convolution. In Int. Conf. Comput. Vis., 2017.
- Drag your GAN: interactive point-based manipulation on the generative image manifold. In Erik Brunvand, Alla Sheffer, and Michael Wimmer (eds.), SIGGRAPH, 2023.
- BMBC: bilateral motion estimation with bilateral cost volume for video interpolation. In Eur. Conf. Comput. Vis., 2020.
- Asymmetric bilateral motion estimation for video frame interpolation. In Int. Conf. Comput. Vis., 2021.
- The 2017 DAVIS challenge on video object segmentation. arXiv: Computing Research Repo., abs/1704.00675, 2017.
- FILM: frame interpolation for large motion. In Eur. Conf. Comput. Vis., 2022.
- Dragdiffusion: Harnessing diffusion models for interactive point-based image editing. arXiv: Computing Research Repo., abs/2306.14435, 2023.
- XVFI: extreme video frame interpolation. In Int. Conf. Comput. Vis., 2021.
- UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv: Computing Research Repo., abs/1212.0402, 2012.
- Modelscope text-to-video technical report. arXiv: Computing Research Repo., abs/2308.06571, 2023a.
- Videocomposer: Compositional video synthesis with motion controllability. In Adv. Neural Inform. Process. Syst., 2023b.
- Generative inbetweening: Adapting image-to-video models for keyframe interpolation. arXiv: Computing Research Repo., abs/2408.15239, 2024a.
- Motionctrl: A unified and flexible motion controller for video generation. In SIGGRAPH, 2024b.
- Draganything: Motion control for anything using entity representation. arXiv: Computing Research Repo., abs/2403.07420, 2024.
- Dynamicrafter: Animating open-domain images with video diffusion priors. arXiv: Computing Research Repo., abs/2310.12190, 2023.
- Tooncrafter: Generative cartoon interpolation. arXiv: Computing Research Repo., abs/2405.17933, 2024.
- Quadratic video interpolation. In Adv. Neural Inform. Process. Syst., 2019.
- Video enhancement with task-oriented flow. Int. J. Comput. Vis., 2019.
- RAPHAEL: text-to-image generation via large mixture of diffusion paths. In Adv. Neural Inform. Process. Syst., 2023.
- Dragnuwa: Fine-grained control in video generation by integrating text, image, and trajectory. arXiv: Computing Research Repo., abs/2308.08089, 2023.
- Adding conditional control to text-to-image diffusion models. In Int. Conf. Comput. Vis., 2023.