Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Dec 22, 2022

Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Weixian Lei, Yuchao Gu, Wynne Hsu, Ying Shan, Xiaohu Qie, Mike Zheng Shou

Figure 1 for Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Figure 2 for Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Figure 3 for Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Figure 4 for Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Share this with someone who'll enjoy it:

Abstract:To reproduce the success of text-to-image (T2I) generation, recent works in text-to-video (T2V) generation employ large-scale text-video dataset for fine-tuning. However, such paradigm is computationally expensive. Humans have the amazing ability to learn new visual concepts from just one single exemplar. We hereby study a new T2V generation problem$\unicode{x2014}$One-Shot Video Generation, where only a single text-video pair is presented for training an open-domain T2V generator. Intuitively, we propose to adapt the T2I diffusion model pretrained on massive image data for T2V generation. We make two key observations: 1) T2I models are able to generate images that align well with the verb terms; 2) extending T2I models to generate multiple images concurrently exhibits surprisingly good content consistency. To further learn continuous motion, we propose Tune-A-Video with a tailored Sparse-Causal Attention, which generates videos from text prompts via an efficient one-shot tuning of pretrained T2I diffusion models. Tune-A-Video is capable of producing temporally-coherent videos over various applications such as change of subject or background, attribute editing, style transfer, demonstrating the versatility and effectiveness of our method.

* Preprint

View paper on

Share this with someone who'll enjoy it:

Title:Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Paper and Code