Abstract:Human pose estimation facilitates markerless movement analysis in sports, as well as in clinical applications. Still, state-of-the-art models for human pose estimation generally do not meet the requirements for real-life deployment. The main reason for this is that the more the field progresses, the more expensive the approaches become, with high computational demands. To cope with the challenges caused by this trend, we propose a convolutional neural network architecture that benefits from the recently proposed EfficientNets to deliver scalable single-person pose estimation. To this end, we introduce EfficientPose, which is a family of models harnessing an effective multi-scale feature extractor, computation efficient detection blocks utilizing mobile inverted bottleneck convolutions, and upscaling improving precision of pose configurations. EfficientPose enables real-world deployment on edge devices through 500K parameter model consuming less than one GFLOP. The results from our experiments, using the challenging MPII single-person benchmark, show that the proposed EfficientPose models substantially outperform the widely-used OpenPose model in terms of accuracy, while being at the same time up to 15 times smaller and 20 times more computationally efficient than its counterpart.