Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhengjia Li

Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Sep 06, 2021

Ziniu Wan, Zhengjia Li, Maoqing Tian, Jianbo Liu, Shuai Yi, Hongsheng Li

Figure 1 for Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Figure 2 for Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Figure 3 for Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Figure 4 for Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Abstract:3D human shape and pose estimation is the essential task for human motion analysis, which is widely used in many 3D applications. However, existing methods cannot simultaneously capture the relations at multiple levels, including spatial-temporal level and human joint level. Therefore they fail to make accurate predictions in some hard scenarios when there is cluttered background, occlusion, or extreme pose. To this end, we propose Multi-level Attention Encoder-Decoder Network (MAED), including a Spatial-Temporal Encoder (STE) and a Kinematic Topology Decoder (KTD) to model multi-level attentions in a unified framework. STE consists of a series of cascaded blocks based on Multi-Head Self-Attention, and each block uses two parallel branches to learn spatial and temporal attention respectively. Meanwhile, KTD aims at modeling the joint level attention. It regards pose estimation as a top-down hierarchical process similar to SMPL kinematic tree. With the training set of 3DPW, MAED outperforms previous state-of-the-art methods by 6.2, 7.2, and 2.4 mm of PA-MPJPE on the three widely used benchmarks 3DPW, MPI-INF-3DHP, and Human3.6M respectively. Our code is available at https://github.com/ziniuwan/maed.

Via

Access Paper or Ask Questions

Sequential End-to-end Network for Efficient Person Search

Mar 18, 2021

Zhengjia Li, Duoqian Miao

Figure 1 for Sequential End-to-end Network for Efficient Person Search

Figure 2 for Sequential End-to-end Network for Efficient Person Search

Figure 3 for Sequential End-to-end Network for Efficient Person Search

Figure 4 for Sequential End-to-end Network for Efficient Person Search

Abstract:Person search aims at jointly solving Person Detection and Person Re-identification (re-ID). Existing works have designed end-to-end networks based on Faster R-CNN. However, due to the parallel structure of Faster R-CNN, the extracted features come from the low-quality proposals generated by the Region Proposal Network, rather than the detected high-quality bounding boxes. Person search is a fine-grained task and such inferior features will significantly reduce re-ID performance. To address this issue, we propose a Sequential End-to-end Network (SeqNet) to extract superior features. In SeqNet, detection and re-ID are considered as a progressive process and tackled with two sub-networks sequentially. In addition, we design a robust Context Bipartite Graph Matching (CBGM) algorithm to effectively employ context information as an important complementary cue for person matching. Extensive experiments on two widely used person search benchmarks, CUHK-SYSU and PRW, have shown that our method achieves state-of-the-art results. Also, our model runs at 11.5 fps on a single GPU and can be integrated into the existing end-to-end framework easily.

Via

Access Paper or Ask Questions