Abstract:Training machine learning algorithms from a small and imbalanced dataset is often a daunting challenge in medical research. However, it has been shown that the synthetic data generated by data augmentation techniques can enlarge the dataset and contribute to alleviating the imbalance situation. In this study, we propose a novel generative adversarial network (GAN) architecture-Welch-GAN and focused on examining how its influence on classifier performance is related to signal quality and class imbalance within the context of photoplethysmography (PPG)-based atrial fibrillation (AF) detection. Pulse oximetry data were collected from 126 adult patients and augmented using the permutation technique to build a large training set for training an AF detection model based on a one-dimensional residual neural network. To test the model, PPG data were collected from 13 stroke patients and utilized. Four data augmentation methods, including both traditional and GANs, are leveraged as baseline in this study. Three different experiments are designed to investigate each data augmentation methods from the aspect of performance gain, robustness to motion artifact and training sample size, respectively. Compared to the un-augmented data, by training the same AF classification algorithm using augmented data, the AF detection accuracy was significantly improved from 80.36% to over 90% with no compromise on sensitivity nor on negative predicted value. Within each data augmentation techniques, Welch-GAN has shown around 3% superiority in terms of AF detection accuracy compared to the baseline methods, which suggests the state-of-the-art of our proposed Welch-GAN.