Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Jul 19, 2024

Kaiyue Sun, Kaiyi Huang, Xian Liu, Yue Wu, Zihan Xu, Zhenguo Li, Xihui Liu

Figure 1 for T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Figure 2 for T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Figure 3 for T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Figure 4 for T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Share this with someone who'll enjoy it:

Abstract:Text-to-video (T2V) generation models have advanced significantly, yet their ability to compose different objects, attributes, actions, and motions into a video remains unexplored. Previous text-to-video benchmarks also neglect this important ability for evaluation. In this work, we conduct the first systematic study on compositional text-to-video generation. We propose T2V-CompBench, the first benchmark tailored for compositional text-to-video generation. T2V-CompBench encompasses diverse aspects of compositionality, including consistent attribute binding, dynamic attribute binding, spatial relationships, motion binding, action binding, object interactions, and generative numeracy. We further carefully design evaluation metrics of MLLM-based metrics, detection-based metrics, and tracking-based metrics, which can better reflect the compositional text-to-video generation quality of seven proposed categories with 700 text prompts. The effectiveness of the proposed metrics is verified by correlation with human evaluations. We also benchmark various text-to-video generative models and conduct in-depth analysis across different models and different compositional categories. We find that compositional text-to-video generation is highly challenging for current models, and we hope that our attempt will shed light on future research in this direction.

* 13 pages (30 in total), project page: https://t2v-compbench.github.io/

View paper on

Share this with someone who'll enjoy it:

Title:T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Paper and Code