Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation

Sep 16, 2022

Haoyu Ma, Zhe Wang, Yifei Chen, Deying Kong, Liangjian Chen, Xingwei Liu, Xiangyi Yan, Hao Tang, Xiaohui Xie

Figure 1 for PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation

Figure 2 for PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation

Figure 3 for PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation

Figure 4 for PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation

Share this with someone who'll enjoy it:

Abstract:Recently, the vision transformer and its variants have played an increasingly important role in both monocular and multi-view human pose estimation. Considering image patches as tokens, transformers can model the global dependencies within the entire image or across images from other views. However, global attention is computationally expensive. As a consequence, it is difficult to scale up these transformer-based methods to high-resolution features and many views. In this paper, we propose the token-Pruned Pose Transformer (PPT) for 2D human pose estimation, which can locate a rough human mask and performs self-attention only within selected tokens. Furthermore, we extend our PPT to multi-view human pose estimation. Built upon PPT, we propose a new cross-view fusion strategy, called human area fusion, which considers all human foreground pixels as corresponding candidates. Experimental results on COCO and MPII demonstrate that our PPT can match the accuracy of previous pose transformer methods while reducing the computation. Moreover, experiments on Human 3.6M and Ski-Pose demonstrate that our Multi-view PPT can efficiently fuse cues from multiple views and achieve new state-of-the-art results.

* ECCV 2022. Code is available at https://github.com/HowieMa/PPT

View paper on

Share this with someone who'll enjoy it:

Title:PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation

Paper and Code