Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation

Oct 11, 2023

Bo Peng, Xinyuan Chen, Yaohui Wang, Chaochao Lu, Yu Qiao

Figure 1 for ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation

Figure 2 for ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation

Figure 3 for ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation

Figure 4 for ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation

Share this with someone who'll enjoy it:

Abstract:Recent works have successfully extended large-scale text-to-image models to the video domain, producing promising results but at a high computational cost and requiring a large amount of video data. In this work, we introduce ConditionVideo, a training-free approach to text-to-video generation based on the provided condition, video, and input text, by leveraging the power of off-the-shelf text-to-image generation methods (e.g., Stable Diffusion). ConditionVideo generates realistic dynamic videos from random noise or given scene videos. Our method explicitly disentangles the motion representation into condition-guided and scenery motion components. To this end, the ConditionVideo model is designed with a UNet branch and a control branch. To improve temporal coherence, we introduce sparse bi-directional spatial-temporal attention (sBiST-Attn). The 3D control network extends the conventional 2D controlnet model, aiming to strengthen conditional generation accuracy by additionally leveraging the bi-directional frames in the temporal domain. Our method exhibits superior performance in terms of frame consistency, clip score, and conditional accuracy, outperforming other compared methods.

View paper on

Share this with someone who'll enjoy it:

Title:ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation

Paper and Code