Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy

May 02, 2024

Hoang-Quan Nguyen, Thanh-Dat Truong, Khoa Luu

Figure 1 for Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy

Figure 2 for Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy

Figure 3 for Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy

Figure 4 for Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy

Share this with someone who'll enjoy it:

Abstract:Action recognition has become one of the popular research topics in computer vision. There are various methods based on Convolutional Networks and self-attention mechanisms as Transformers to solve both spatial and temporal dimensions problems of action recognition tasks that achieve competitive performances. However, these methods lack a guarantee of the correctness of the action subject that the models give attention to, i.e., how to ensure an action recognition model focuses on the proper action subject to make a reasonable action prediction. In this paper, we propose a multi-view attention consistency method that computes the similarity between two attentions from two different views of the action videos using Directed Gromov-Wasserstein Discrepancy. Furthermore, our approach applies the idea of Neural Radiance Field to implicitly render the features from novel views when training on single-view datasets. Therefore, the contributions in this work are three-fold. Firstly, we introduce the multi-view attention consistency to solve the problem of reasonable prediction in action recognition. Secondly, we define a new metric for multi-view consistent attention using Directed Gromov-Wasserstein Discrepancy. Thirdly, we built an action recognition model based on Video Transformers and Neural Radiance Fields. Compared to the recent action recognition methods, the proposed approach achieves state-of-the-art results on three large-scale datasets, i.e., Jester, Something-Something V2, and Kinetics-400.

View paper on

Share this with someone who'll enjoy it:

Title:Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy

Paper and Code