Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:VidGen-1M: A Large-Scale Dataset for Text-to-video Generation

Aug 05, 2024

Zhiyu Tan, Xiaomeng Yang, Luozheng Qin, Hao Li

Figure 1 for VidGen-1M: A Large-Scale Dataset for Text-to-video Generation

Figure 2 for VidGen-1M: A Large-Scale Dataset for Text-to-video Generation

Figure 3 for VidGen-1M: A Large-Scale Dataset for Text-to-video Generation

Figure 4 for VidGen-1M: A Large-Scale Dataset for Text-to-video Generation

Share this with someone who'll enjoy it:

Abstract:The quality of video-text pairs fundamentally determines the upper bound of text-to-video models. Currently, the datasets used for training these models suffer from significant shortcomings, including low temporal consistency, poor-quality captions, substandard video quality, and imbalanced data distribution. The prevailing video curation process, which depends on image models for tagging and manual rule-based curation, leads to a high computational load and leaves behind unclean data. As a result, there is a lack of appropriate training datasets for text-to-video models. To address this problem, we present VidGen-1M, a superior training dataset for text-to-video models. Produced through a coarse-to-fine curation strategy, this dataset guarantees high-quality videos and detailed captions with excellent temporal consistency. When used to train the video generation model, this dataset has led to experimental results that surpass those obtained with other models.

* project page: https://sais-fuxi.github.io/projects/vidgen-1m

View paper on

Share this with someone who'll enjoy it:

Title:VidGen-1M: A Large-Scale Dataset for Text-to-video Generation

Paper and Code