Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:GCF-Net: Gated Clip Fusion Network for Video Action Recognition

Feb 02, 2021

Jenhao Hsiao, Jiawei Chen, Chiuman Ho

Figure 1 for GCF-Net: Gated Clip Fusion Network for Video Action Recognition

Figure 2 for GCF-Net: Gated Clip Fusion Network for Video Action Recognition

Figure 3 for GCF-Net: Gated Clip Fusion Network for Video Action Recognition

Figure 4 for GCF-Net: Gated Clip Fusion Network for Video Action Recognition

Share this with someone who'll enjoy it:

Abstract:In recent years, most of the accuracy gains for video action recognition have come from the newly designed CNN architectures (e.g., 3D-CNNs). These models are trained by applying a deep CNN on single clip of fixed temporal length. Since each video segment are processed by the 3D-CNN module separately, the corresponding clip descriptor is local and the inter-clip relationships are inherently implicit. Common method that directly averages the clip-level outputs as a video-level prediction is prone to fail due to the lack of mechanism that can extract and integrate relevant information to represent the video. In this paper, we introduce the Gated Clip Fusion Network (GCF-Net) that can greatly boost the existing video action classifiers with the cost of a tiny computation overhead. The GCF-Net explicitly models the inter-dependencies between video clips to strengthen the receptive field of local clip descriptors. Furthermore, the importance of each clip to an action event is calculated and a relevant subset of clips is selected accordingly for a video-level analysis. On a large benchmark dataset (Kinetics-600), the proposed GCF-Net elevates the accuracy of existing action classifiers by 11.49% (based on central clip) and 3.67% (based on densely sampled clips) respectively.

View paper on

Share this with someone who'll enjoy it:

Title:GCF-Net: Gated Clip Fusion Network for Video Action Recognition

Paper and Code