Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yunsheng Pang

Interactive Spatiotemporal Token Attention Network for Skeleton-based General Interactive Action Recognition

Jul 14, 2023

Yuhang Wen, Zixuan Tang, Yunsheng Pang, Beichen Ding, Mengyuan Liu

Abstract:Recognizing interactive action plays an important role in human-robot interaction and collaboration. Previous methods use late fusion and co-attention mechanism to capture interactive relations, which have limited learning capability or inefficiency to adapt to more interacting entities. With assumption that priors of each entity are already known, they also lack evaluations on a more general setting addressing the diversity of subjects. To address these problems, we propose an Interactive Spatiotemporal Token Attention Network (ISTA-Net), which simultaneously model spatial, temporal, and interactive relations. Specifically, our network contains a tokenizer to partition Interactive Spatiotemporal Tokens (ISTs), which is a unified way to represent motions of multiple diverse entities. By extending the entity dimension, ISTs provide better interactive representations. To jointly learn along three dimensions in ISTs, multi-head self-attention blocks integrated with 3D convolutions are designed to capture inter-token correlations. When modeling correlations, a strict entity ordering is usually irrelevant for recognizing interactive actions. To this end, Entity Rearrangement is proposed to eliminate the orderliness in ISTs for interchangeable entities. Extensive experiments on four datasets verify the effectiveness of ISTA-Net by outperforming state-of-the-art methods. Our code is publicly available at https://github.com/Necolizer/ISTA-Net

* IROS 2023 Camera-ready version. Project website: https://necolizer.github.io/ISTA-Net/

Via

Access Paper or Ask Questions

IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition

Jul 25, 2022

Yunsheng Pang, Qiuhong Ke, Hossein Rahmani, James Bailey, Jun Liu

Figure 1 for IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition

Figure 2 for IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition

Figure 3 for IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition

Figure 4 for IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition

Abstract:Human interaction recognition is very important in many applications. One crucial cue in recognizing an interaction is the interactive body parts. In this work, we propose a novel Interaction Graph Transformer (IGFormer) network for skeleton-based interaction recognition via modeling the interactive body parts as graphs. More specifically, the proposed IGFormer constructs interaction graphs according to the semantic and distance correlations between the interactive body parts, and enhances the representation of each person by aggregating the information of the interactive body parts based on the learned graphs. Furthermore, we propose a Semantic Partition Module to transform each human skeleton sequence into a Body-Part-Time sequence to better capture the spatial and temporal information of the skeleton sequence for learning the graphs. Extensive experiments on three benchmark datasets demonstrate that our model outperforms the state-of-the-art with a significant margin.

* Accepted by ECCV 2022

Via

Access Paper or Ask Questions

Graph Pooling via Coarsened Graph Infomax

May 31, 2021

Yunsheng Pang, Yunxiang Zhao, Dongsheng Li

Figure 1 for Graph Pooling via Coarsened Graph Infomax

Figure 2 for Graph Pooling via Coarsened Graph Infomax

Figure 3 for Graph Pooling via Coarsened Graph Infomax

Figure 4 for Graph Pooling via Coarsened Graph Infomax

Abstract:Graph pooling that summaries the information in a large graph into a compact form is essential in hierarchical graph representation learning. Existing graph pooling methods either suffer from high computational complexity or cannot capture the global dependencies between graphs before and after pooling. To address the problems of existing graph pooling methods, we propose Coarsened Graph Infomax Pooling (CGIPool) that maximizes the mutual information between the input and the coarsened graph of each pooling layer to preserve graph-level dependencies. To achieve mutual information neural maximization, we apply contrastive learning and propose a self-attention-based algorithm for learning positive and negative samples. Extensive experimental results on seven datasets illustrate the superiority of CGIPool comparing to the state-of-the-art methods.

Via

Access Paper or Ask Questions