Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning

Dec 10, 2024

Yicheng Wang, Zhikang Zhang, Jue Wang, David Fan, Zhenlin Xu, Linda Liu, Xiang Hao, Vimal Bhat, Xinyu Li

Figure 1 for GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning

Figure 2 for GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning

Figure 3 for GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning

Figure 4 for GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning

Share this with someone who'll enjoy it:

Abstract:In various video-language learning tasks, the challenge of achieving cross-modality alignment with multi-grained data persists. We propose a method to tackle this challenge from two crucial perspectives: data and modeling. Given the absence of a multi-grained video-text pretraining dataset, we introduce a Granularity EXpansion (GEX) method with Integration and Compression operations to expand the granularity of a single-grained dataset. To better model multi-grained data, we introduce an Iterative Approximation Module (IAM), which embeds multi-grained videos and texts into a unified, low-dimensional semantic space while preserving essential information for cross-modal alignment. Furthermore, GEXIA is highly scalable with no restrictions on the number of video-text granularities for alignment. We evaluate our work on three categories of video tasks across seven benchmark datasets, showcasing state-of-the-art or comparable performance. Remarkably, our model excels in tasks involving long-form video understanding, even though the pretraining dataset only contains short video clips.

View paper on

Share this with someone who'll enjoy it:

Title:GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning

Paper and Code