Abstract:Leveraging wearable devices for motion reconstruction has emerged as an economical and viable technique. Certain methodologies employ sparse Inertial Measurement Units (IMUs) on the human body and harness data-driven strategies to model human poses. However, the reconstruction of motion based solely on sparse IMUs data is inherently fraught with ambiguity, a consequence of numerous identical IMU readings corresponding to different poses. In this paper, we explore the spatial importance of multiple sensors, supervised by text that describes specific actions. Specifically, uncertainty is introduced to derive weighted features for each IMU. We also design a Hierarchical Temporal Transformer (HTT) and apply contrastive learning to achieve precise temporal and feature alignment of sensor data with textual semantics. Experimental results demonstrate our proposed approach achieves significant improvements in multiple metrics compared to existing methods. Notably, with textual supervision, our method not only differentiates between ambiguous actions such as sitting and standing but also produces more precise and natural motion.
Abstract:Object tracking has been studied for decades, but most of the existing works are focused on the short-term tracking. For a long sequence, the object is often fully occluded or out of view for a long time, and existing short-term object tracking algorithms often lose the target, and it is difficult to re-catch the target even if it reappears again. In this paper a novel long-term object tracking algorithm flow_MDNet_RPN is proposed, in which a tracking result judgement module and a detection module are added to the short-term object tracking algorithm. Experiments show that the proposed long-term tracking algorithm is effective to the problem of target disappearance.