Abstract:Deep learning methods have been widely used for Human Activity Recognition (HAR) using recorded signals from Iner-tial Measurement Units (IMUs) sensors that are installed on various parts of the human body. For this type of HAR, sev-eral challenges exist, the most significant of which is the analysis of multivarious IMU sensors data. Here, we introduce a Hierarchically Unsupervised Fusion (HUF) model designed to extract, and fuse features from IMU sensors data via a hybrid structure of Convolutional Neural Networks (CNN)s and Autoencoders (AE)s. First, we design a stack CNN-AE to embed short-time signals into sets of high dimensional features. Second, we develop another CNN-AE network to locally fuse the extracted features from each sensor unit. Finally, we unify all the sensor features through a third CNN-AE architecture as globally feature fusion to create a unique feature set. Additionally, we analyze the effects of varying the model hyperparameters. The best results are achieved with eight convolutional layers in each AE. Furthermore, it is determined that an overcomplete AE with 256 kernels in the code layer is suitable for feature extraction in the first block of the proposed HUF model; this number reduces to 64 in the last block of the model to customize the size of the applied features to the classifier. The tuned model is applied to the UCI-HAR, DaLiAc, and Parkinson's disease gait da-tasets, achieving the classification accuracies of 97%, 97%, and 88%, respectively, which are nearly 3% better com-pared to the state-of-the-art supervised methods.