Abstract:Cross-dataset Human Activity Recognition (HAR) suffers from limited model generalization, hindering its practical deployment. To address this critical challenge, inspired by the success of DoReMi in Large Language Models (LLMs), we introduce a data mixture optimization strategy for pre-training HAR models, aiming to improve the recognition performance across heterogeneous datasets. However, directly applying DoReMi to the HAR field encounters new challenges due to the continuous, multi-channel and intrinsic heterogeneous characteristics of IMU sensor data. To overcome these limitations, we propose a novel framework HAR-DoReMi, which introduces a masked reconstruction task based on Mean Squared Error (MSE) loss. By raplacing the discrete language sequence prediction task, which relies on the Negative Log-Likelihood (NLL) loss, in the original DoReMi framework, the proposed framework is inherently more appropriate for handling the continuous and multi-channel characteristics of IMU data. In addition, HAR-DoReMi integrates the Mahony fusion algorithm into the self-supervised HAR pre-training, aiming to mitigate the heterogeneity of varying sensor orientation. This is achieved by estimating the sensor orientation within each dataset and facilitating alignment with a unified coordinate system, thereby improving the cross-dataset generalization ability of the HAR model. Experimental evaluation on multiple cross-dataset HAR transfer tasks demonstrates that HAR-DoReMi improves the accuracy by an average of 6.51%, compared to the current state-of-the-art method with only approximately 30% to 50% of the data usage. These results confirm the effectiveness of HAR-DoReMi in improving the generalization and data efficiency of pre-training HAR models, underscoring its significant potential to facilitate the practical deployment of HAR technology.
Abstract:While recent years have witnessed the advancement in big data and Artificial Intelligence (AI), it is of much importance to safeguard data privacy and security. As an innovative approach, Federated Learning (FL) addresses these concerns by facilitating collaborative model training across distributed data sources without transferring raw data. However, the challenges of robust security and privacy across decentralized networks catch significant attention in dealing with the distributed data in FL. In this paper, we conduct an extensive survey of the security and privacy issues prevalent in FL, underscoring the vulnerability of communication links and the potential for cyber threats. We delve into various defensive strategies to mitigate these risks, explore the applications of FL across different sectors, and propose research directions. We identify the intricate security challenges that arise within the FL frameworks, aiming to contribute to the development of secure and efficient FL systems.
Abstract:Human Activity Recognition (HAR) plays a crucial role in various applications such as human-computer interaction and healthcare monitoring. However, challenges persist in HAR models due to the data distribution differences between training and real-world data distributions, particularly evident in cross-user scenarios. This paper introduces a novel framework, termed Diffusion-based Noise-centered Adversarial Learning Domain Adaptation (Diff-Noise-Adv-DA), designed to address these challenges by leveraging generative diffusion modeling and adversarial learning techniques. Traditional HAR models often struggle with the diversity of user behaviors and sensor data distributions. Diff-Noise-Adv-DA innovatively integrates the inherent noise within diffusion models, harnessing its latent information to enhance domain adaptation. Specifically, the framework transforms noise into a critical carrier of activity and domain class information, facilitating robust classification across different user domains. Experimental evaluations demonstrate the effectiveness of Diff-Noise-Adv-DA in improving HAR model performance across different users, surpassing traditional domain adaptation methods. The framework not only mitigates distribution mismatches but also enhances data quality through noise-based denoising techniques.