Abstract:Obtaining large-scale well-annotated is always a daunting challenge, especially in the medical research domain because of the shortage of domain expert. Instead of human annotation, in this work, we use the alarm information generated from bed-side monitor to get the pseudo label for the co-current photoplethysmography (PPG) signal. Based on this strategy, we end up with over 8 million 30-second PPG segment. To solve the label noise caused by false alarms, we propose the cluster consistency, which use an unsupervised auto-encoder (hence not subject to label noise) approach to cluster training samples into a finite number of clusters. Then the learned cluster membership is used in the subsequent supervised learning phase to force the distance in the latent space of samples in the same cluster to be small while that of samples in different clusters to be big. In the experiment, we compare with the state-of-the-art algorithms and test on external datasets. The results show the superiority of our method in both classification performance and efficiency.