Abstract:Recent advances in machine learning have highlighted Federated Learning (FL) as a promising approach that enables multiple distributed users (so-called clients) to collectively train ML models without sharing their private data. While this privacy-preserving method shows potential, it struggles when data across clients is not independent and identically distributed (non-IID) data. The latter remains an unsolved challenge that can result in poorer model performance and slower training times. Despite the significance of non-IID data in FL, there is a lack of consensus among researchers about its classification and quantification. This systematic review aims to fill that gap by providing a detailed taxonomy for non-IID data, partition protocols, and metrics to quantify data heterogeneity. Additionally, we describe popular solutions to address non-IID data and standardized frameworks employed in FL with heterogeneous data. Based on our state-of-the-art review, we present key lessons learned and suggest promising future research directions.
Abstract:Background: AI-based analysis of sufficiently large, curated medical datasets has been shown to be promising for providing early detection, faster diagnosis, better decision-making, and more effective treatment. However, accessing such highly confidential and very sensitive medical data, obtained from a variety of sources, is usually highly restricted since improper use, unsafe storage, data leakage or abuse could violate a person's privacy. In this work we apply a federated learning paradigm over a heterogeneous, siloed sets of high-definition electrocardiogram arriving from 12-leads ECG sensors arrays to train AI models. We evaluated the capacity of the resulting models to achieve equivalent performance when compared to state-of-the-art models trained when the same data is collected in a central place. Methods: We propose a privacy preserving methodology for training AI models based on the federated learning paradigm over a heterogeneous, distributed, dataset. The methodology is applied to a broad range of machine learning techniques based on gradient boosting, convolutional neural network and recurrent neural networks with long short-term memory. The models were trained over a ECG dataset containing 12-leads recordings collected from 43,059 patients from six geographically separate and heterogeneous sources. Findings: The resulting set of AI models for detecting cardiovascular abnormalities achieved comparable predictive performances against models trained using a centralised learning approach. Interpretation: The approach of compute parameters contributing to the global model locally and then exchange only such parameters instead of the whole sensitive data as in ML contributes to preserve medical data privacy.
Abstract:The Associazione Medici Diabetologi (AMD) collects and manages one of the largest worldwide-available collections of diabetic patient records, also known as the AMD database. This paper presents the initial results of an ongoing project whose focus is the application of Artificial Intelligence and Machine Learning techniques for conceptualizing, cleaning, and analyzing such an important and valuable dataset, with the goal of providing predictive insights to better support diabetologists in their diagnostic and therapeutic choices.
Abstract:Public educational systems operate thousands of buildings with vastly different characteristics in terms of size, age, location, construction, thermal behavior and user communities. Their strategic planning and sustainable operation is an extremely complex and requires quantitative evidence on the performance of buildings such as the interaction of indoor-outdoor environment. Internet of Things (IoT) deployments can provide the necessary data to evaluate, redesign and eventually improve the organizational and managerial measures. In this work a data mining approach is presented to analyze the sensor data collected over a period of 2 years from an IoT infrastructure deployed over 18 school buildings spread in Greece, Italy and Sweden. The real-world evaluation indicates that data mining on sensor data can provide critical insights to building managers and custodial staff about ways to lower a building's energy footprint through effectively managing building operations.