Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Prompting Visual-Language Models for Efficient Video Understanding

Dec 08, 2021

Chen Ju, Tengda Han, Kunhao Zheng, Ya Zhang, Weidi Xie

Figure 1 for Prompting Visual-Language Models for Efficient Video Understanding

Figure 2 for Prompting Visual-Language Models for Efficient Video Understanding

Figure 3 for Prompting Visual-Language Models for Efficient Video Understanding

Figure 4 for Prompting Visual-Language Models for Efficient Video Understanding

Share this with someone who'll enjoy it:

Abstract:Visual-language pre-training has shown great success for learning joint visual-textual representations from large-scale web data, demonstrating remarkable ability for zero-shot generalisation. This paper presents a simple method to efficiently adapt one pre-trained visual-language model to novel tasks with minimal training, and here, we consider video understanding tasks. Specifically, we propose to optimise a few random vectors, termed as continuous prompt vectors, that convert the novel tasks into the same format as the pre-training objectives. In addition, to bridge the gap between static images and videos, temporal information is encoded with lightweight Transformers stacking on top of frame-wise visual features. Experimentally, we conduct extensive ablation studies to analyse the critical components and necessities. On 9 public benchmarks of action recognition, action localisation, and text-video retrieval, across closed-set, few-shot, open-set scenarios, we achieve competitive or state-of-the-art performance to existing methods, despite training significantly fewer parameters.

* Project page: https://ju-chen.github.io/efficient-prompt/

View paper on

Share this with someone who'll enjoy it:

Title:Prompting Visual-Language Models for Efficient Video Understanding

Paper and Code