Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Towards Completeness: A Generalizable Action Proposal Generator for Zero-Shot Temporal Action Localization

Aug 25, 2024

Jia-Run Du, Kun-Yu Lin, Jingke Meng, Wei-Shi Zheng

Figure 1 for Towards Completeness: A Generalizable Action Proposal Generator for Zero-Shot Temporal Action Localization

Figure 2 for Towards Completeness: A Generalizable Action Proposal Generator for Zero-Shot Temporal Action Localization

Figure 3 for Towards Completeness: A Generalizable Action Proposal Generator for Zero-Shot Temporal Action Localization

Figure 4 for Towards Completeness: A Generalizable Action Proposal Generator for Zero-Shot Temporal Action Localization

Share this with someone who'll enjoy it:

Abstract:To address the zero-shot temporal action localization (ZSTAL) task, existing works develop models that are generalizable to detect and classify actions from unseen categories. They typically develop a category-agnostic action detector and combine it with the Contrastive Language-Image Pre-training (CLIP) model to solve ZSTAL. However, these methods suffer from incomplete action proposals generated for \textit{unseen} categories, since they follow a frame-level prediction paradigm and require hand-crafted post-processing to generate action proposals. To address this problem, in this work, we propose a novel model named Generalizable Action Proposal generator (GAP), which can interface seamlessly with CLIP and generate action proposals in a holistic way. Our GAP is built in a query-based architecture and trained with a proposal-level objective, enabling it to estimate proposal completeness and eliminate the hand-crafted post-processing. Based on this architecture, we propose an Action-aware Discrimination loss to enhance the category-agnostic dynamic information of actions. Besides, we introduce a Static-Dynamic Rectifying module that incorporates the generalizable static information from CLIP to refine the predicted proposals, which improves proposal completeness in a generalizable manner. Our experiments show that our GAP achieves state-of-the-art performance on two challenging ZSTAL benchmarks, i.e., Thumos14 and ActivityNet1.3. Specifically, our model obtains significant performance improvement over previous works on the two benchmarks, i.e., +3.2% and +3.4% average mAP, respectively.

* Accepted to ICPR 2024. Code is available at https://github.com/Run542968/GAP

View paper on

Share this with someone who'll enjoy it:

Title:Towards Completeness: A Generalizable Action Proposal Generator for Zero-Shot Temporal Action Localization

Paper and Code