Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Towards Long Video Understanding via Fine-detailed Video Story Generation

Dec 09, 2024

Zeng You, Zhiquan Wen, Yaofo Chen, Xin Li, Runhao Zeng, Yaowei Wang, Mingkui Tan

Figure 1 for Towards Long Video Understanding via Fine-detailed Video Story Generation

Figure 2 for Towards Long Video Understanding via Fine-detailed Video Story Generation

Figure 3 for Towards Long Video Understanding via Fine-detailed Video Story Generation

Figure 4 for Towards Long Video Understanding via Fine-detailed Video Story Generation

Share this with someone who'll enjoy it:

Abstract:Long video understanding has become a critical task in computer vision, driving advancements across numerous applications from surveillance to content retrieval. Existing video understanding methods suffer from two challenges when dealing with long video understanding: intricate long-context relationship modeling and interference from redundancy. To tackle these challenges, we introduce Fine-Detailed Video Story generation (FDVS), which interprets long videos into detailed textual representations. Specifically, to achieve fine-grained modeling of long-temporal content, we propose a Bottom-up Video Interpretation Mechanism that progressively interprets video content from clips to video. To avoid interference from redundant information in videos, we introduce a Semantic Redundancy Reduction mechanism that removes redundancy at both the visual and textual levels. Our method transforms long videos into hierarchical textual representations that contain multi-granularity information of the video. With these representations, FDVS is applicable to various tasks without any fine-tuning. We evaluate the proposed method across eight datasets spanning three tasks. The performance demonstrates the effectiveness and versatility of our method.

View paper on

Share this with someone who'll enjoy it:

Title:Towards Long Video Understanding via Fine-detailed Video Story Generation

Paper and Code