Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Contrastive Video-Language Learning with Fine-grained Frame Sampling

Oct 10, 2022

Zixu Wang, Yujie Zhong, Yishu Miao, Lin Ma, Lucia Specia

Figure 1 for Contrastive Video-Language Learning with Fine-grained Frame Sampling

Figure 2 for Contrastive Video-Language Learning with Fine-grained Frame Sampling

Figure 3 for Contrastive Video-Language Learning with Fine-grained Frame Sampling

Figure 4 for Contrastive Video-Language Learning with Fine-grained Frame Sampling

Share this with someone who'll enjoy it:

Abstract:Despite recent progress in video and language representation learning, the weak or sparse correspondence between the two modalities remains a bottleneck in the area. Most video-language models are trained via pair-level loss to predict whether a pair of video and text is aligned. However, even in paired video-text segments, only a subset of the frames are semantically relevant to the corresponding text, with the remainder representing noise; where the ratio of noisy frames is higher for longer videos. We propose FineCo (Fine-grained Contrastive Loss for Frame Sampling), an approach to better learn video and language representations with a fine-grained contrastive objective operating on video frames. It helps distil a video by selecting the frames that are semantically equivalent to the text, improving cross-modal correspondence. Building on the well established VideoCLIP model as a starting point, FineCo achieves state-of-the-art performance on YouCookII, a text-video retrieval benchmark with long videos. FineCo also achieves competitive results on text-video retrieval (MSR-VTT), and video question answering datasets (MSR-VTT QA and MSR-VTT MC) with shorter videos.

* AACL-IJCNLP 2022

View paper on

Share this with someone who'll enjoy it:

Title:Contrastive Video-Language Learning with Fine-grained Frame Sampling

Paper and Code