Abstract:Occlusion is an omnipresent challenge in 3D human pose estimation (HPE). In spite of the large amount of research dedicated to 3D HPE, only a limited number of studies address the problem of occlusion explicitly. To fill this gap, we propose to combine exploitation of spatio-temporal features with synthetic occlusion augmentation during training to deal with occlusion. To this end, we build a spatio-temporal 3D HPE model, StridedPoseGraphFormer based on graph convolution and transformers, and train it using occlusion augmentation. Unlike the existing occlusion-aware methods, that are only tested for limited occlusion, we extensively evaluate our method for varying degrees of occlusion. We show that our proposed method compares favorably with the state-of-the-art (SoA). Our experimental results also reveal that in the absence of any occlusion handling mechanism, the performance of SoA 3D HPE methods degrades significantly when they encounter occlusion.