Abstract:Atrial fibrillation (AF) is a common atrial arrhythmia that impairs quality of life and causes embolic stroke, heart failure and other complications. Recent advancements in machine learning (ML) and deep learning (DL) have shown potential for enhancing diagnostic accuracy. It is essential for DL models to be robust and generalizable across variations in ethnicity, age, sex, and other factors. Although a number of ECG database have been made available to the research community, none includes a Japanese population sample. Saitama Heart Database Atrial Fibrillation (SHDB-AF) is a novel open-sourced Holter ECG database from Japan, containing data from 100 unique patients with paroxysmal AF. Each record in SHDB-AF is 24 hours long and sampled at 200 Hz, totaling 24 million seconds of ECG data.
Abstract:Background: Sleep staging is a fundamental component in the diagnosis of sleep disorders and the management of sleep health. Traditionally, this analysis is conducted in clinical settings and involves a time-consuming scoring procedure. Recent data-driven algorithms for sleep staging, using the photoplethysmogram (PPG) time series, have shown high performance on local test sets but lower performance on external datasets due to data drift. Methods: This study aimed to develop a generalizable deep learning model for the task of four class (wake, light, deep, and rapid eye movement (REM)) sleep staging from raw PPG physiological time-series. Six sleep datasets, totaling 2,574 patients recordings, were used. In order to create a more generalizable representation, we developed and evaluated a deep learning model called SleepPPG-Net2, which employs a multi-source domain training approach.SleepPPG-Net2 was benchmarked against two state-of-the-art models. Results: SleepPPG-Net2 showed consistently higher performance over benchmark approaches, with generalization performance (Cohen's kappa) improving by up to 19%. Performance disparities were observed in relation to age, sex, and sleep apnea severity. Conclusion: SleepPPG-Net2 sets a new standard for staging sleep from raw PPG time-series.
Abstract:Introduction: Deep learning models for detecting episodes of atrial fibrillation (AF) using rhythm information in long-term, ambulatory ECG recordings have shown high performance. However, the rhythm-based approach does not take advantage of the morphological information conveyed by the different ECG waveforms, particularly the f-waves. As a result, the performance of such models may be inherently limited. Methods: To address this limitation, we have developed a deep learning model, named RawECGNet, to detect episodes of AF and atrial flutter (AFl) using the raw, single-lead ECG. We compare the generalization performance of RawECGNet on two external data sets that account for distribution shifts in geography, ethnicity, and lead position. RawECGNet is further benchmarked against a state-of-the-art deep learning model, named ArNet2, which utilizes rhythm information as input. Results: Using RawECGNet, the results for the different leads in the external test sets in terms of the F1 score were 0.91--0.94 in RBDB and 0.93 in SHDB, compared to 0.89--0.91 in RBDB and 0.91 in SHDB for ArNet2. The results highlight RawECGNet as a high-performance, generalizable algorithm for detection of AF and AFl episodes, exploiting information on both rhythm and morphology.
Abstract:Diabetic retinopathy (DR) is a prevalent complication of diabetes associated with a significant risk of vision loss. Timely identification is critical to curb vision impairment. Algorithms for DR staging from digital fundus images (DFIs) have been recently proposed. However, models often fail to generalize due to distribution shifts between the source domain on which the model was trained and the target domain where it is deployed. A common and particularly challenging shift is often encountered when the source- and target-domain supports do not fully overlap. In this research, we introduce DRStageNet, a deep learning model designed to mitigate this challenge. We used seven publicly available datasets, comprising a total of 93,534 DFIs that cover a variety of patient demographics, ethnicities, geographic origins and comorbidities. We fine-tune DINOv2, a pretrained model of self-supervised vision transformer, and implement a multi-source domain fine-tuning strategy to enhance generalization performance. We benchmark and demonstrate the superiority of our method to two state-of-the-art benchmarks, including a recently published foundation model. We adapted the grad-rollout method to our regression task in order to provide high-resolution explainability heatmaps. The error analysis showed that 59\% of the main errors had incorrect reference labels. DRStageNet is accessible at URL [upon acceptance of the manuscript].
Abstract:Over the past few years, research has witnessed the advancement of deep learning models trained on large datasets, some even encompassing millions of examples. While these impressive performance on their hidden test sets, they often underperform when assessed on external datasets. Recognizing the critical role of generalization in medical AI development, many prestigious journals now require reporting results both on the local hidden test set as well as on external datasets before considering a study for publication. Effectively, the field of medical AI has transitioned from the traditional usage of a single dataset that is split into train and test to a more comprehensive framework using multiple datasets, some of which are used for model development (source domain) and others for testing (target domains). However, this new experimental setting does not necessarily resolve the challenge of generalization. This is because of the variability encountered in intended use and specificities across hospital cultures making the idea of universally generalizable systems a myth. On the other hand, the systematic, and a fortiori recurrent re-calibration, of models at the individual hospital level, although ideal, may be overoptimistic given the legal, regulatory and technical challenges that are involved. Re-calibration using transfer learning may not even be possible in some instances where reference labels of target domains are not available. In this perspective we establish a hierarchical three-level scale system reflecting the generalization level of a medical AI algorithm. This scale better reflects the diversity of real-world medical scenarios per which target domain data for re-calibration of models may or not be available and if it is, may or not have reference labels systematically available.
Abstract:Background: Atrial fibrillation (AF) is one of the most common cardiac arrhythmias that affects millions of people each year worldwide and it is closely linked to increased risk of cardiovascular diseases such as stroke and heart failure. Machine learning methods have shown promising results in evaluating the risk of developing atrial fibrillation from the electrocardiogram. We aim to develop and evaluate one such algorithm on a large CODE dataset collected in Brazil. Results: The deep neural network model identified patients without indication of AF in the presented ECG but who will develop AF in the future with an AUC score of 0.845. From our survival model, we obtain that patients in the high-risk group (i.e. with the probability of a future AF case being greater than 0.7) are 50% more likely to develop AF within 40 weeks, while patients belonging to the minimal-risk group (i.e. with the probability of a future AF case being less than or equal to 0.1) have more than 85% chance of remaining AF free up until after seven years. Conclusion: We developed and validated a model for AF risk prediction. If applied in clinical practice, the model possesses the potential of providing valuable and useful information in decision-making and patient management processes.
Abstract:The retina is the only part of the human body in which blood vessels can be accessed non-invasively using imaging techniques such as digital fundus images (DFI). The spatial distribution of the retinal microvasculature may change with cardiovascular diseases and thus the eyes may be regarded as a window to our hearts. Computerized segmentation of the retinal arterioles and venules (A/V) is essential for automated microvasculature analysis. Using active learning, we created a new DFI dataset containing 240 crowd-sourced manual A/V segmentations performed by fifteen medical students and reviewed by an ophthalmologist, and developed LUNet, a novel deep learning architecture for high resolution A/V segmentation. LUNet architecture includes a double dilated convolutional block that aims to enhance the receptive field of the model and reduce its parameter count. Furthermore, LUNet has a long tail that operates at high resolution to refine the segmentation. The custom loss function emphasizes the continuity of the blood vessels. LUNet is shown to significantly outperform two state-of-the-art segmentation algorithms on the local test set as well as on four external test sets simulating distribution shifts across ethnicity, comorbidities, and annotators. We make the newly created dataset open access (upon publication).
Abstract:Introduction: The presence of fibrillatory waves (f-waves) is important in the diagnosis of atrial fibrillation (AF), which has motivated the development of methods for f-wave extraction. We propose a novel approach to benchmarking methods designed for single-lead ECG analysis, building on the hypothesis that better-performing AF classification using features computed from the extracted f-waves implies better-performing extraction. The approach is well-suited for processing large Holter data sets annotated with respect to the presence of AF. Methods: Three data sets with a total of 300 two- or three-lead Holter recordings, performed in the USA, Israel and Japan, were used as well as a simulated single-lead data set. Four existing extraction methods based on either average beat subtraction or principal component analysis (PCA) were evaluated. A random forest classifier was used for window-based AF classification. Performance was measured by the area under the receiver operating characteristic (AUROC). Results: The best performance was found for PCA-based extraction, resulting in AUROCs in the ranges 0.77--0.83, 0.62--0.78, and 0.87--0.89 for the data sets from USA, Israel, and Japan, respectively, when analyzed across leads; the AUROC of the simulated single-lead, noisy data set was 0.98. Conclusions: This study provides a novel approach to evaluating the performance of f-wave extraction methods, offering the advantage of not using ground truth f-waves for evaluation, thus being able to leverage real data sets for evaluation. The code is open source (following publication).
Abstract:Introduction: 12-lead electrocardiogram (ECG) is recorded during atrial fibrillation (AF) catheter ablation procedure (CAP). It is not easy to determine if CAP was successful without a long follow-up assessing for AF recurrence (AFR). Therefore, an AFR risk prediction algorithm could enable a better management of CAP patients. In this research, we extracted features from 12-lead ECG recorded before and after CAP and train an AFR risk prediction machine learning model. Methods: Pre- and post-CAP segments were extracted from 112 patients. The analysis included a signal quality criterion, heart rate variability and morphological biomarkers engineered from the 12-lead ECG (804 features overall). 43 out of the 112 patients (n) had AFR clinical endpoint available. These were utilized to assess the feasibility of AFR risk prediction, using either pre or post CAP features. A random forest classifier was trained within a nested cross validation framework. Results: 36 features were found statistically significant for distinguishing between the pre and post surgery states (n=112). For the classification, an area under the receiver operating characteristic (AUROC) curve was reported with AUROC_pre=0.64 and AUROC_post=0.74 (n=43). Discussion and conclusions: This preliminary analysis showed the feasibility of AFR risk prediction. Such a model could be used to improve CAP management.
Abstract:Atrial fibrillation (AF) is the most prevalent heart arrhythmia. AF manifests on the electrocardiogram (ECG) though irregular beat-to-beat time interval variation, the absence of P-wave and the presence of fibrillatory waves (f-wave). We hypothesize that a deep learning (DL) approach trained on the raw ECG will enable robust detection of AF events and the estimation of the AF burden (AFB). We further hypothesize that the performance reached leveraging the raw ECG will be superior to previously developed methods using the beat-to-beat interval variation time series. Consequently, we develop a new DL algorithm, denoted ArNet-ECG, to robustly detect AF events and estimate the AFB from the raw ECG and benchmark this algorithms against previous work. Methods: A dataset including 2,247 adult patients and totaling over 53,753 hours of continuous ECG from the University of Virginia (UVAF) was used. Results: ArNet-ECG obtained an F1 of 0.96 and ArNet2 obtained an F1 0.94. Discussion and conclusion: ArNet-ECG outperformed ArNet2 thus demonstrating that using the raw ECG provides added performance over the beat-to-beat interval time series. The main reason found for explaining the higher performance of ArNet-ECG was its high performance on atrial flutter examples versus poor performance on these recordings for ArNet2.