Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces

Mar 12, 2024

Yuta Oshima, Shohei Taniguchi, Masahiro Suzuki, Yutaka Matsuo

Figure 1 for SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces

Figure 2 for SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces

Figure 3 for SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces

Figure 4 for SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces

Share this with someone who'll enjoy it:

Abstract:Given the remarkable achievements in image generation through diffusion models, the research community has shown increasing interest in extending these models to video generation. Recent diffusion models for video generation have predominantly utilized attention layers to extract temporal features. However, attention layers are limited by their memory consumption, which increases quadratically with the length of the sequence. This limitation presents significant challenges when attempting to generate longer video sequences using diffusion models. To overcome this challenge, we propose leveraging state-space models (SSMs). SSMs have recently gained attention as viable alternatives due to their linear memory consumption relative to sequence length. In the experiments, we first evaluate our SSM-based model with UCF101, a standard benchmark of video generation. In addition, to investigate the potential of SSMs for longer video generation, we perform an experiment using the MineRL Navigate dataset, varying the number of frames to 64 and 150. In these settings, our SSM-based model can considerably save memory consumption for longer sequences, while maintaining competitive FVD scores to the attention-based models. Our codes are available at https://github.com/shim0114/SSM-Meets-Video-Diffusion-Models.

* Accepted as workshop paper at ICLR 2024

View paper on

Share this with someone who'll enjoy it:

Title:SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces

Paper and Code