Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition

Mar 31, 2023

Xiaoyu Zhu, Po-Yao Huang, Junwei Liang, Celso M. de Melo, Alexander Hauptmann

Figure 1 for STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition

Figure 2 for STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition

Figure 3 for STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition

Figure 4 for STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition

Share this with someone who'll enjoy it:

Abstract:We study the problem of human action recognition using motion capture (MoCap) sequences. Unlike existing techniques that take multiple manual steps to derive standardized skeleton representations as model input, we propose a novel Spatial-Temporal Mesh Transformer (STMT) to directly model the mesh sequences. The model uses a hierarchical transformer with intra-frame off-set attention and inter-frame self-attention. The attention mechanism allows the model to freely attend between any two vertex patches to learn non-local relationships in the spatial-temporal domain. Masked vertex modeling and future frame prediction are used as two self-supervised tasks to fully activate the bi-directional and auto-regressive attention in our hierarchical transformer. The proposed method achieves state-of-the-art performance compared to skeleton-based and point-cloud-based models on common MoCap benchmarks. Code is available at https://github.com/zgzxy001/STMT.

* CVPR 2023

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition

Paper and Code