Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Learning Video Representations without Natural Videos

Oct 31, 2024

Xueyang Yu, Xinlei Chen, Yossi Gandelsman

Figure 1 for Learning Video Representations without Natural Videos

Figure 2 for Learning Video Representations without Natural Videos

Figure 3 for Learning Video Representations without Natural Videos

Figure 4 for Learning Video Representations without Natural Videos

Share this with someone who'll enjoy it:

Abstract:In this paper, we show that useful video representations can be learned from synthetic videos and natural images, without incorporating natural videos in the training. We propose a progression of video datasets synthesized by simple generative processes, that model a growing set of natural video properties (e.g. motion, acceleration, and shape transformations). The downstream performance of video models pre-trained on these generated datasets gradually increases with the dataset progression. A VideoMAE model pre-trained on our synthetic videos closes 97.2% of the performance gap on UCF101 action classification between training from scratch and self-supervised pre-training from natural videos, and outperforms the pre-trained model on HMDB51. Introducing crops of static images to the pre-training stage results in similar performance to UCF101 pre-training and outperforms the UCF101 pre-trained model on 11 out of 14 out-of-distribution datasets of UCF101-P. Analyzing the low-level properties of the datasets, we identify correlations between frame diversity, frame similarity to natural data, and downstream performance. Our approach provides a more controllable and transparent alternative to video data curation processes for pre-training.

* Project page: https://unicorn53547.github.io/video_syn_rep/

View paper on

Share this with someone who'll enjoy it:

Title:Learning Video Representations without Natural Videos

Paper and Code