Abstract:As the number of automatic tools based on machine learning (ML) and resting-state electroencephalography (rs-EEG) for Parkinson's disease (PD) detection keeps growing, the assessment of possible exacerbation of health disparities by means of fairness and bias analysis becomes more relevant. Protected attributes, such as gender, play an important role in PD diagnosis development. However, analysis of sub-group populations stemming from different genders is seldom taken into consideration in ML models' development or the performance assessment for PD detection. In this work, we perform a systematic analysis of the detection ability for gender sub-groups in a multi-center setting of a previously developed ML algorithm based on power spectral density (PSD) features of rs-EEG. We find significant differences in the PD detection ability for males and females at testing time (80.5% vs. 63.7% accuracy) and significantly higher activity for a set of parietal and frontal EEG channels and frequency sub-bands for PD and non-PD males that might explain the differences in the PD detection ability for the gender sub-groups.
Abstract:Resting-state EEG (rs-EEG) has been demonstrated to aid in Parkinson's disease (PD) diagnosis. In particular, the power spectral density (PSD) of low-frequency bands ({\delta} and {\theta}) and high-frequency bands ({\alpha} and \b{eta}) has been shown to be significantly different in patients with PD as compared to subjects without PD (non-PD). However, rs-EEG feature extraction and the interpretation thereof can be time-intensive and prone to examiner variability. Machine learning (ML) has the potential to automatize the analysis of rs-EEG recordings and provides a supportive tool for clinicians to ease their workload. In this work, we use rs-EEG recordings of 84 PD and 85 non-PD subjects pooled from four datasets obtained at different centers. We propose an end-to-end pipeline consisting of preprocessing, extraction of PSD features from clinically validated frequency bands, and feature selection before evaluating the classification ability of the features via ML algorithms to stratify between PD and non-PD subjects. Further, we evaluate the effect of feature harmonization, given the multi-center nature of the datasets. Our validation results show, on average, an improvement in PD detection ability (69.6% vs. 75.5% accuracy) by logistic regression when harmonizing the features and performing univariate feature selection (k = 202 features). Our final results show an average global accuracy of 72.2% with balanced accuracy results for all the centers included in the study: 60.6%, 68.7%, 77.7%, and 82.2%, respectively.