Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:VideoQA in the Era of LLMs: An Empirical Study

Aug 08, 2024

Junbin Xiao, Nanxin Huang, Hangyu Qin, Dongyang Li, Yicong Li, Fengbin Zhu, Zhulin Tao, Jianxing Yu, Liang Lin, Tat-Seng Chua(+1 more)

Figure 1 for VideoQA in the Era of LLMs: An Empirical Study

Figure 2 for VideoQA in the Era of LLMs: An Empirical Study

Figure 3 for VideoQA in the Era of LLMs: An Empirical Study

Figure 4 for VideoQA in the Era of LLMs: An Empirical Study

Share this with someone who'll enjoy it:

Abstract:Video Large Language Models (Video-LLMs) are flourishing and has advanced many video-language tasks. As a golden testbed, Video Question Answering (VideoQA) plays pivotal role in Video-LLM developing. This work conducts a timely and comprehensive study of Video-LLMs' behavior in VideoQA, aiming to elucidate their success and failure modes, and provide insights towards more human-like video understanding and question answering. Our analyses demonstrate that Video-LLMs excel in VideoQA; they can correlate contextual cues and generate plausible responses to questions about varied video contents. However, models falter in handling video temporality, both in reasoning about temporal content ordering and grounding QA-relevant temporal moments. Moreover, the models behave unintuitively - they are unresponsive to adversarial video perturbations while being sensitive to simple variations of candidate answers and questions. Also, they do not necessarily generalize better. The findings demonstrate Video-LLMs' QA capability in standard condition yet highlight their severe deficiency in robustness and interpretability, suggesting the urgent need on rationales in Video-LLM developing.

* Preprint. Under Review

View paper on

Share this with someone who'll enjoy it:

Title:VideoQA in the Era of LLMs: An Empirical Study

Paper and Code