Abstract:In the realm of ubiquitous computing, Human Activity Recognition (HAR) is vital for the automation and intelligent identification of human actions through data from diverse sensors. However, traditional machine learning approaches by aggregating data on a central server and centralized processing are memory-intensive and raise privacy concerns. Federated Learning (FL) has emerged as a solution by training a global model collaboratively across multiple devices by exchanging their local model parameters instead of local data. However, in realistic settings, sensor data on devices is non-independently and identically distributed (Non-IID). This means that data activity recorded by most devices is sparse, and sensor data distribution for each client may be inconsistent. As a result, typical FL frameworks in heterogeneous environments suffer from slow convergence and poor performance due to deviation of the global model's objective from the global objective. Most FL methods applied to HAR are either designed for overly ideal scenarios without considering the Non-IID problem or present privacy and scalability concerns. This work addresses these challenges, proposing CDFL, an efficient federated learning framework for image-based HAR. CDFL efficiently selects a representative set of privacy-preserved images using contrastive learning and deep clustering, reduces communication overhead by selecting effective clients for global model updates, and improves global model quality by training on privacy-preserved data. Our comprehensive experiments carried out on three public datasets, namely Stanford40, PPMI, and VOC2012, demonstrate the superiority of CDFL in terms of performance, convergence rate, and bandwidth usage compared to state-of-the-art approaches.