Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MagicVideo: Efficient Video Generation With Latent Diffusion Models

Nov 20, 2022

Daquan Zhou, Weimin Wang, Hanshu Yan, Weiwei Lv, Yizhe Zhu, Jiashi Feng

Figure 1 for MagicVideo: Efficient Video Generation With Latent Diffusion Models

Figure 2 for MagicVideo: Efficient Video Generation With Latent Diffusion Models

Figure 3 for MagicVideo: Efficient Video Generation With Latent Diffusion Models

Figure 4 for MagicVideo: Efficient Video Generation With Latent Diffusion Models

Share this with someone who'll enjoy it:

Abstract:We present an efficient text-to-video generation framework based on latent diffusion models, termed MagicVideo. Given a text description, MagicVideo can generate photo-realistic video clips with high relevance to the text content. With the proposed efficient latent 3D U-Net design, MagicVideo can generate video clips with 256x256 spatial resolution on a single GPU card, which is 64x faster than the recent video diffusion model (VDM). Unlike previous works that train video generation from scratch in the RGB space, we propose to generate video clips in a low-dimensional latent space. We further utilize all the convolution operator weights of pre-trained text-to-image generative U-Net models for faster training. To achieve this, we introduce two new designs to adapt the U-Net decoder to video data: a framewise lightweight adaptor for the image-to-video distribution adjustment and a directed temporal attention module to capture frame temporal dependencies. The whole generation process is within the low-dimension latent space of a pre-trained variation auto-encoder. We demonstrate that MagicVideo can generate both realistic video content and imaginary content in a photo-realistic style with a trade-off in terms of quality and computational cost. Refer to https://magicvideo.github.io/# for more examples.

View paper on

Share this with someone who'll enjoy it:

Title:MagicVideo: Efficient Video Generation With Latent Diffusion Models

Paper and Code