Abstract:3D human pose estimation is a key component of clinical monitoring systems. The clinical applicability of deep pose estimation models, however, is limited by their poor generalization under domain shifts along with their need for sufficient labeled training data. As a remedy, we present a novel domain adaptation method, adapting a model from a labeled source to a shifted unlabeled target domain. Our method comprises two complementary adaptation strategies based on prior knowledge about human anatomy. First, we guide the learning process in the target domain by constraining predictions to the space of anatomically plausible poses. To this end, we embed the prior knowledge into an anatomical loss function that penalizes asymmetric limb lengths, implausible bone lengths, and implausible joint angles. Second, we propose to filter pseudo labels for self-training according to their anatomical plausibility and incorporate the concept into the Mean Teacher paradigm. We unify both strategies in a point cloud-based framework applicable to unsupervised and source-free domain adaptation. Evaluation is performed for in-bed pose estimation under two adaptation scenarios, using the public SLP dataset and a newly created dataset. Our method consistently outperforms various state-of-the-art domain adaptation methods, surpasses the baseline model by 31%/66%, and reduces the domain gap by 65%/82%. Source code is available at https://github.com/multimodallearning/da-3dhpe-anatomy.
Abstract:Graph convolutional networks are a new promising learning approach to deal with data on irregular domains. They are predestined to overcome certain limitations of conventional grid-based architectures and will enable efficient handling of point clouds or related graphical data representations, e.g. superpixel graphs. Learning feature extractors and classifiers on 3D point clouds is still an underdeveloped area and has potential restrictions to equal graph topologies. In this work, we derive a new architectural design that combines rotationally and topologically invariant graph diffusion operators and node-wise feature learning through 1x1 convolutions. By combining multiple isotropic diffusion operations based on the Laplace-Beltrami operator, we can learn an optimal linear combination of diffusion kernels for effective feature propagation across nodes on an irregular graph. We validated our approach for learning point descriptors as well as semantic classification on real 3D point clouds of human poses and demonstrate an improvement from 85% to 95% in Dice overlap with our multi-kernel approach.