Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Temporally Consistent Video Transformer for Long-Term Video Prediction

Oct 05, 2022

Wilson Yan, Danijar Hafner, Stephen James, Pieter Abbeel

Figure 1 for Temporally Consistent Video Transformer for Long-Term Video Prediction

Figure 2 for Temporally Consistent Video Transformer for Long-Term Video Prediction

Figure 3 for Temporally Consistent Video Transformer for Long-Term Video Prediction

Figure 4 for Temporally Consistent Video Transformer for Long-Term Video Prediction

Share this with someone who'll enjoy it:

Abstract:Generating long, temporally consistent video remains an open challenge in video generation. Primarily due to computational limitations, most prior methods limit themselves to training on a small subset of frames that are then extended to generate longer videos through a sliding window fashion. Although these techniques may produce sharp videos, they have difficulty retaining long-term temporal consistency due to their limited context length. In this work, we present Temporally Consistent Video Transformer (TECO), a vector-quantized latent dynamics video prediction model that learns compressed representations to efficiently condition on long videos of hundreds of frames during both training and generation. We use a MaskGit prior for dynamics prediction which enables both sharper and faster generations compared to prior work. Our experiments show that TECO outperforms SOTA baselines in a variety of video prediction benchmarks ranging from simple mazes in DMLab, large 3D worlds in Minecraft, and complex real-world videos from Kinetics-600. In addition, to better understand the capabilities of video prediction models in modeling temporal consistency, we introduce several challenging video prediction tasks consisting of agents randomly traversing 3D scenes of varying difficulty. This presents a challenging benchmark for video prediction in partially observable environments where a model must understand what parts of the scenes to re-create versus invent depending on its past observations or generations. Generated videos are available at https://wilson1yan.github.io/teco

* Project website: https://wilson1yan.github.io/teco

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Temporally Consistent Video Transformer for Long-Term Video Prediction

Paper and Code