Abstract:The last decade has witnessed a notable surge in deep learning applications for the analysis of electroencephalography (EEG) data, thanks to its demonstrated superiority over conventional statistical techniques. However, even deep learning models can underperform if trained with bad processed data. While preprocessing is essential to the analysis of EEG data, there is a need of research examining its precise impact on model performance. This causes uncertainty about whether and to what extent EEG data should be preprocessed in a deep learning scenario. This study aims at investigating the role of EEG preprocessing in deep learning applications, drafting guidelines for future research. It evaluates the impact of different levels of preprocessing, from raw and minimally filtered data to complex pipelines with automated artifact removal algorithms. Six classification tasks (eye blinking, motor imagery, Parkinson's and Alzheimer's disease, sleep deprivation, and first episode psychosis) and four different architectures commonly used in the EEG domain were considered for the evaluation. The analysis of 4800 different trainings revealed statistical differences between the preprocessing pipelines at the intra-task level, for each of the investigated models, and at the inter-task level, for the largest one. Raw data generally leads to underperforming models, always ranking last in averaged score. In addition, models seem to benefit more from minimal pipelines without artifact handling methods, suggesting that EEG artifacts may contribute to the performance of deep neural networks.
Abstract:SelfEEG is an open-source Python library developed to assist researchers in conducting Self-Supervised Learning (SSL) experiments on electroencephalography (EEG) data. Its primary objective is to offer a user-friendly but highly customizable environment, enabling users to efficiently design and execute self-supervised learning tasks on EEG data. SelfEEG covers all the stages of a typical SSL pipeline, ranging from data import to model design and training. It includes modules specifically designed to: split data at various granularity levels (e.g., session-, subject-, or dataset-based splits); effectively manage data stored with different configurations (e.g., file extensions, data types) during mini-batch construction; provide a wide range of standard deep learning models, data augmentations and SSL baseline methods applied to EEG data. Most of the functionalities offered by selfEEG can be executed both on GPUs and CPUs, expanding its usability beyond the self-supervised learning area. Additionally, these functionalities can be employed for the analysis of other biomedical signals often coupled with EEGs, such as electromyography or electrocardiography data. These features make selfEEG a versatile deep learning tool for biomedical applications and a useful resource in SSL, one of the currently most active fields of Artificial Intelligence.