Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:APLA: Additional Perturbation for Latent Noise with Adversarial Training Enables Consistency

Aug 24, 2023

Yupu Yao, Shangqi Deng, Zihan Cao, Harry Zhang, Liang-Jian Deng

Figure 1 for APLA: Additional Perturbation for Latent Noise with Adversarial Training Enables Consistency

Figure 2 for APLA: Additional Perturbation for Latent Noise with Adversarial Training Enables Consistency

Figure 3 for APLA: Additional Perturbation for Latent Noise with Adversarial Training Enables Consistency

Figure 4 for APLA: Additional Perturbation for Latent Noise with Adversarial Training Enables Consistency

Share this with someone who'll enjoy it:

Abstract:Diffusion models have exhibited promising progress in video generation. However, they often struggle to retain consistent details within local regions across frames. One underlying cause is that traditional diffusion models approximate Gaussian noise distribution by utilizing predictive noise, without fully accounting for the impact of inherent information within the input itself. Additionally, these models emphasize the distinction between predictions and references, neglecting information intrinsic to the videos. To address this limitation, inspired by the self-attention mechanism, we propose a novel text-to-video (T2V) generation network structure based on diffusion models, dubbed Additional Perturbation for Latent noise with Adversarial training (APLA). Our approach only necessitates a single video as input and builds upon pre-trained stable diffusion networks. Notably, we introduce an additional compact network, known as the Video Generation Transformer (VGT). This auxiliary component is designed to extract perturbations from the inherent information contained within the input, thereby refining inconsistent pixels during temporal predictions. We leverage a hybrid architecture of transformers and convolutions to compensate for temporal intricacies, enhancing consistency between different frames within the video. Experiments demonstrate a noticeable improvement in the consistency of the generated videos both qualitatively and quantitatively.

View paper on

Share this with someone who'll enjoy it:

Title:APLA: Additional Perturbation for Latent Noise with Adversarial Training Enables Consistency

Paper and Code