I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models (2312.16693v4)

Published 27 Dec 2023 in cs.CV

Abstract: Text-guided image-to-video (I2V) generation aims to generate a coherent video that preserves the identity of the input image and semantically aligns with the input prompt. Existing methods typically augment pretrained text-to-video (T2V) models by either concatenating the image with noised video frames channel-wise before being fed into the model or injecting the image embedding produced by pretrained image encoders in cross-attention modules. However, the former approach often necessitates altering the fundamental weights of pretrained T2V models, thus restricting the model's compatibility within the open-source communities and disrupting the model's prior knowledge. Meanwhile, the latter typically fails to preserve the identity of the input image. We present I2V-Adapter to overcome such limitations. I2V-Adapter adeptly propagates the unnoised input image to subsequent noised frames through a cross-frame attention mechanism, maintaining the identity of the input image without any changes to the pretrained T2V model. Notably, I2V-Adapter only introduces a few trainable parameters, significantly alleviating the training cost and also ensures compatibility with existing community-driven personalized models and control tools. Moreover, we propose a novel Frame Similarity Prior to balance the motion amplitude and the stability of generated videos through two adjustable control coefficients. Our experimental results demonstrate that I2V-Adapter is capable of producing high-quality videos. This performance, coupled with its agility and adaptability, represents a substantial advancement in the field of I2V, particularly for personalized and controllable applications.

References (58)

Authors (12)

Xun Guo (20 papers)
Mingwu Zheng (11 papers)
Liang Hou (24 papers)
Yuan Gao (336 papers)
Yufan Deng (11 papers)
Chongyang Ma (52 papers)
Weiming Hu (91 papers)
Zhengjun Zha (24 papers)
Haibin Huang (60 papers)
Pengfei Wan (86 papers)
Di Zhang (231 papers)
Yufan Liu (18 papers)

Citations (9)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/1487345386226614278/status/1740955099848196254

https://twitter.com/22146921/status/1741231706982830447

https://twitter.com/1637708085958696961/status/1741088206567969150

I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models (2312.16693v4)

Summary

Related Papers

Tweets