Abstract:In multi-person pose estimation actors can be heavily occluded, even become fully invisible behind another person. While temporal methods can still predict a reasonable estimation for a temporarily disappeared pose using past and future frames, they exhibit large errors nevertheless. We present an energy minimization approach to generate smooth, valid trajectories in time, bridging gaps in visibility. We show that it is better than other interpolation based approaches and achieves state of the art results. In addition, we present the synthetic MuCo-Temp dataset, a temporal extension of the MuCo-3DHP dataset. Our code is made publicly available.
Abstract:In 3D human pose estimation one of the biggest problems is the lack of large, diverse datasets. This is especially true for multi-person 3D pose estimation, where, to our knowledge, there are only machine generated annotations available for training. To mitigate this issue, we introduce a network that can be trained with additional RGB-D images in a weakly supervised fashion. Due to the existence of cheap sensors, videos with depth maps are widely available, and our method can exploit a large, unannotated dataset. Our algorithm is a monocular, multi-person, absolute pose estimator. We evaluate the algorithm on several benchmarks, showing a consistent improvement in error rates. Also, our model achieves state-of-the-art results on the MuPoTS-3D dataset by a considerable margin.