Abstract:Pose estimation has promised to impact healthcare by enabling more practical methods to quantify nuances of human movement and biomechanics. However, despite the inherent connection between pose estimation and biomechanics, these disciplines have largely remained disparate. For example, most current pose estimation benchmarks use metrics such as Mean Per Joint Position Error, Percentage of Correct Keypoints, or mean Average Precision to assess performance, without quantifying kinematic and physiological correctness - key aspects for biomechanics. To alleviate this challenge, we develop OpenCapBench to offer an easy-to-use unified benchmark to assess common tasks in human pose estimation, evaluated under physiological constraints. OpenCapBench computes consistent kinematic metrics through joints angles provided by an open-source musculoskeletal modeling software (OpenSim). Through OpenCapBench, we demonstrate that current pose estimation models use keypoints that are too sparse for accurate biomechanics analysis. To mitigate this challenge, we introduce SynthPose, a new approach that enables finetuning of pre-trained 2D human pose models to predict an arbitrarily denser set of keypoints for accurate kinematic analysis through the use of synthetic data. Incorporating such finetuning on synthetic data of prior models leads to twofold reduced joint angle errors. Moreover, OpenCapBench allows users to benchmark their own developed models on our clinically relevant cohort. Overall, OpenCapBench bridges the computer vision and biomechanics communities, aiming to drive simultaneous advances in both areas.
Abstract:Motion capture from a limited number of inertial measurement units (IMUs) has important applications in health, human performance, and virtual reality. Real-world limitations and application-specific goals dictate different IMU configurations (i.e., number of IMUs and chosen attachment body segments), trading off accuracy and practicality. Although recent works were successful in accurately reconstructing whole-body motion from six IMUs, these systems only work with a specific IMU configuration. Here we propose a single diffusion generative model, Diffusion Inertial Poser (DiffIP), which reconstructs human motion in real-time from arbitrary IMU configurations. We show that DiffIP has the benefit of flexibility with respect to the IMU configuration while being as accurate as the state-of-the-art for the commonly used six IMU configuration. Our system enables selecting an optimal configuration for different applications without retraining the model. For example, when only four IMUs are available, DiffIP found that the configuration that minimizes errors in joint kinematics instruments the thighs and forearms. However, global translation reconstruction is better when instrumenting the feet instead of the thighs. Although our approach is agnostic to the underlying model, we built DiffIP based on physiologically realistic musculoskeletal models to enable use in biomedical research and health applications.