Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs

Aug 30, 2024

Zhen Fan, Peng Dai, Zhuo Su, Xu Gao, Zheng Lv, Jiarui Zhang, Tianyuan Du, Guidong Wang, Yang Zhang

Figure 1 for EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs

Figure 2 for EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs

Figure 3 for EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs

Figure 4 for EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs

Share this with someone who'll enjoy it:

Abstract:Egocentric human pose estimation (HPE) using wearable sensors is essential for VR/AR applications. Most methods rely solely on either egocentric-view images or sparse Inertial Measurement Unit (IMU) signals, leading to inaccuracies due to self-occlusion in images or the sparseness and drift of inertial sensors. Most importantly, the lack of real-world datasets containing both modalities is a major obstacle to progress in this field. To overcome the barrier, we propose EMHI, a multimodal \textbf{E}gocentric human \textbf{M}otion dataset with \textbf{H}ead-Mounted Display (HMD) and body-worn \textbf{I}MUs, with all data collected under the real VR product suite. Specifically, EMHI provides synchronized stereo images from downward-sloping cameras on the headset and IMU data from body-worn sensors, along with pose annotations in SMPL format. This dataset consists of 885 sequences captured by 58 subjects performing 39 actions, totaling about 28.5 hours of recording. We evaluate the annotations by comparing them with optical marker-based SMPL fitting results. To substantiate the reliability of our dataset, we introduce MEPoser, a new baseline method for multimodal egocentric HPE, which employs a multimodal fusion encoder, temporal feature encoder, and MLP-based regression heads. The experiments on EMHI show that MEPoser outperforms existing single-modal methods and demonstrates the value of our dataset in solving the problem of egocentric HPE. We believe the release of EMHI and the method could advance the research of egocentric HPE and expedite the practical implementation of this technology in VR/AR products.

View paper on

Share this with someone who'll enjoy it:

Title:EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs

Paper and Code