Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Lightweight Attentional Feature Fusion for Video Retrieval by Text

Dec 03, 2021

Fan Hu, Aozhu Chen, Ziyue Wang, Fangming Zhou, Xirong Li

Figure 1 for Lightweight Attentional Feature Fusion for Video Retrieval by Text

Figure 2 for Lightweight Attentional Feature Fusion for Video Retrieval by Text

Figure 3 for Lightweight Attentional Feature Fusion for Video Retrieval by Text

Figure 4 for Lightweight Attentional Feature Fusion for Video Retrieval by Text

Share this with someone who'll enjoy it:

Abstract:In this paper, we revisit \emph{feature fusion}, an old-fashioned topic, in the new context of video retrieval by text. Different from previous research that considers feature fusion only at one end, let it be video or text, we aim for feature fusion for both ends within a unified framework. We hypothesize that optimizing the convex combination of the features is preferred to modeling their correlations by computationally heavy multi-head self-attention. Accordingly, we propose Lightweight Attentional Feature Fusion (LAFF). LAFF performs feature fusion at both early and late stages and at both video and text ends, making it a powerful method for exploiting diverse (off-the-shelf) features. Extensive experiments on four public datasets, i.e. MSR-VTT, MSVD, TGIF, VATEX, and the large-scale TRECVID AVS benchmark evaluations (2016-2020) show the viability of LAFF. Moreover, LAFF is extremely simple to implement, making it appealing for real-world deployment.

View paper on

Share this with someone who'll enjoy it:

Title:Lightweight Attentional Feature Fusion for Video Retrieval by Text

Paper and Code