Abstract:Objective: Automatic sleep scoring is crucial for diagnosing sleep disorders. Existing frameworks based on Polysomnography often rely on long sequences of input signals to predict sleep stages, which can introduce complexity. Moreover, there is limited exploration of simplifying representation learning in sleep scoring methods. Methods: In this study, we propose NeuroSleepNet, an automatic sleep scoring method designed to classify the current sleep stage using only the microevents in the current input signal, without the need for past inputs. Our model employs supervised spatial and multi-scale temporal context learning and incorporates a transformer encoder to enhance representation learning. Additionally, NeuroSleepNet is optimized for balanced performance across five sleep stages by introducing a logarithmic scale-based weighting technique as a loss function. Results: NeuroSleepNet achieved similar and comparable performance with current state-of-the-art results. The best accuracy, macro-F1 score, and Cohen's kappa were 86.1 percent, 80.8 percent, and 0.805 for Sleep-EDF expanded; 82.0 percent, 76.3 percent, and 0.753 for MESA; 80.5 percent, 76.8 percent, and 0.738 for Physio2018; and 86.7 percent, 80.9 percent, and 0.804 for the SHHS database. Conclusion: NeuroSleepNet demonstrates that even with a focus on computational efficiency and a purely supervised learning approach, it is possible to achieve performance that is comparable to state-of-the-art methods. Significance: Our study simplifies automatic sleep scoring by focusing solely on microevents in the current input signal while maintaining remarkable performance. This offers a streamlined alternative for sleep diagnosis applications.
Abstract:In this study, we address the challenge of speaker recognition using a novel data augmentation technique of adding noise to enrollment files. This technique efficiently aligns the sources of test and enrollment files, enhancing comparability. Various pre-trained models were employed, with the resnet model achieving the highest DCF of 0.84 and an EER of 13.44. The augmentation technique notably improved these results to 0.75 DCF and 12.79 EER for the resnet model. Comparative analysis revealed the superiority of resnet over models such as ECPA, Mel-spectrogram, Payonnet, and Titanet large. Results, along with different augmentation schemes, contribute to the success of RoboVox far-field speaker recognition in this paper