Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

Mar 10, 2024

Wenhao Wang, Yi Yang

Figure 1 for VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

Figure 2 for VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

Figure 3 for VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

Figure 4 for VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

Share this with someone who'll enjoy it:

Abstract:The arrival of Sora marks a new era for text-to-video diffusion models, bringing significant advancements in video generation and potential applications. However, Sora, as well as other text-to-video diffusion models, highly relies on the prompts, and there is no publicly available dataset featuring a study of text-to-video prompts. In this paper, we introduce VidProM, the first large-scale dataset comprising 1.67 million unique text-to-video prompts from real users. Additionally, the dataset includes 6.69 million videos generated by four state-of-the-art diffusion models and some related data. We initially demonstrate the curation of this large-scale dataset, which is a time-consuming and costly process. Subsequently, we show how the proposed VidProM differs from DiffusionDB, a large-scale prompt-gallery dataset for image generation. Based on the analysis of these prompts, we identify the necessity for a new prompt dataset specifically designed for text-to-video generation and gain insights into the preferences of real users when creating videos. Our large-scale and diverse dataset also inspires many exciting new research areas. For instance, to develop better, more efficient, and safer text-to-video diffusion models, we suggest exploring text-to-video prompt engineering, efficient video generation, and video copy detection for diffusion models. We make the collected dataset VidProM publicly available at GitHub and Hugging Face under the CC-BY- NC 4.0 License.

* Please download the collected dataset from https://github.com/WangWenhao0716/VidProM and https://huggingface.co/datasets/WenhaoWang/VidProM

View paper on

Share this with someone who'll enjoy it:

Title:VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

Paper and Code