Abstract:Objective: To develop and validate an automated method for bedside monitoring of sleep state fluctuations in neonatal intensive care units. Methods: A deep learning -based algorithm was designed and trained using 53 EEG recordings from a long-term (a)EEG monitoring in 30 near-term neonates. The results were validated using an external dataset from 30 polysomnography recordings. In addition to training and validating a single EEG channel quiet sleep detector, we constructed Sleep State Trend (SST), a bedside-ready means for visualizing classifier outputs. Results: The accuracy of quiet sleep detection in the training data was 90%, and the accuracy was comparable (85-86%) in all bipolar derivations available from the 4-electrode recordings. The algorithm generalized well to an external dataset, showing 81% overall accuracy despite different signal derivations. SST allowed an intuitive, clear visualization of the classifier output. Conclusions: Fluctuations in sleep states can be detected at high fidelity from a single EEG channel, and the results can be visualized as a transparent and intuitive trend in the bedside monitors. Significance: The Sleep State Trend (SST) may provide caregivers a real-time view of sleep state fluctuations and its cyclicity.
Abstract:Sharing medical data between institutions is difficult in practice due to data protection laws and official procedures within institutions. Therefore, most existing algorithms are trained on relatively small electroencephalogram (EEG) data sets which is likely to be detrimental to prediction accuracy. In this work, we simulate a case when the data can not be shared by splitting the publicly available data set into disjoint sets representing data in individual institutions. We propose to train a (local) detector in each institution and aggregate their individual predictions into one final prediction. Four aggregation schemes are compared, namely, the majority vote, the mean, the weighted mean and the Dawid-Skene method. The approach allows different detector architectures amongst the institutions. The method was validated on an independent data set using only a subset of EEG channels. The ensemble reaches accuracy comparable to a single detector trained on all the data when sufficient amount of data is available in each institution. The weighted mean aggregation scheme showed best overall performance, it was only marginally outperformed by the Dawid-Skene method when local detectors approach performance of a single detector trained on all available data.
Abstract:Neonatal seizure detection algorithms (SDA) are approaching the benchmark of human expert annotation. Measures of algorithm generalizability and non-inferiority as well as measures of clinical efficacy are needed to assess the full scope of neonatal SDA performance. We validated our neonatal SDA on an independent data set of 28 neonates. Generalizability was tested by comparing the performance of the original training set (cross-validation) to its performance on the validation set. Non-inferiority was tested by assessing inter-observer agreement between combinations of SDA and two human expert annotations. Clinical efficacy was tested by comparing how the SDA and human experts quantified seizure burden and identified clinically significant periods of seizure activity in the EEG. Algorithm performance was consistent between training and validation sets with no significant worsening in AUC (p>0.05, n =28). SDA output was inferior to the annotation of the human expert, however, re-training with an increased diversity of data resulted in non-inferior performance ($\Delta\kappa$=0.077, 95% CI: -0.002-0.232, n=18). The SDA assessment of seizure burden had an accuracy ranging from 89-93%, and 87% for identifying periods of clinical interest. The proposed SDA is approaching human equivalence and provides a clinically relevant interpretation of the EEG.