Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization

Aug 14, 2021

Linjiang Huang, Liang Wang, Hongsheng Li

Figure 1 for Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization

Figure 2 for Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization

Figure 3 for Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization

Figure 4 for Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization

Share this with someone who'll enjoy it:

Abstract:As a challenging task of high-level video understanding, weakly supervised temporal action localization has been attracting increasing attention. With only video annotations, most existing methods seek to handle this task with a localization-by-classification framework, which generally adopts a selector to select snippets of high probabilities of actions or namely the foreground. Nevertheless, the existing foreground selection strategies have a major limitation of only considering the unilateral relation from foreground to actions, which cannot guarantee the foreground-action consistency. In this paper, we present a framework named FAC-Net based on the I3D backbone, on which three branches are appended, named class-wise foreground classification branch, class-agnostic attention branch and multiple instance learning branch. First, our class-wise foreground classification branch regularizes the relation between actions and foreground to maximize the foreground-background separation. Besides, the class-agnostic attention branch and multiple instance learning branch are adopted to regularize the foreground-action consistency and help to learn a meaningful foreground classifier. Within each branch, we introduce a hybrid attention mechanism, which calculates multiple attention scores for each snippet, to focus on both discriminative and less-discriminative snippets to capture the full action boundaries. Experimental results on THUMOS14 and ActivityNet1.3 demonstrate the state-of-the-art performance of our method. Our code is available at https://github.com/LeonHLJ/FAC-Net.

* Accepted by ICCV 2021. Code is available at https://github.com/LeonHLJ/FAC-Net

View paper on

Share this with someone who'll enjoy it:

Title:Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization

Paper and Code