Abstract:Unsupervised Anomalous Sound Detection (ASD) aims to design a generalizable method that can be used to detect anomalies when only normal sounds are given. In this paper, Anomalous Sound Detection based on Diffusion Models (ASD-Diffusion) is proposed for ASD in real-world factories. In our pipeline, the anomalies in acoustic features are reconstructed from their noisy corrupted features into their approximate normal pattern. Secondly, a post-processing anomalies filter algorithm is proposed to detect anomalies that exhibit significant deviation from the original input after reconstruction. Furthermore, denoising diffusion implicit model is introduced to accelerate the inference speed by a longer sampling interval of the denoising process. The proposed method is innovative in the application of diffusion models as a new scheme. Experimental results on the development set of DCASE 2023 challenge task 2 outperform the baseline by 7.75%, demonstrating the effectiveness of the proposed method.
Abstract:This paper proposed LightSleepNet - a light-weight, 1-d Convolutional Neural Network (CNN) based personalized architecture for real-time sleep staging, which can be implemented on various mobile platforms with limited hardware resources. The proposed architecture only requires an input of 30s single-channel EEG signal for the classification. Two residual blocks consisting of group 1-d convolution are used instead of the traditional convolution layers to remove the redundancy in the CNN. Channel shuffles are inserted into each convolution layer to improve the accuracy. In order to avoid over-fitting to the training set, a Global Average Pooling (GAP) layer is used to replace the fully connected layer, which further reduces the total number of the model parameters significantly. A personalized algorithm combining Adaptive Batch Normalization (AdaBN) and gradient re-weighting is proposed for unsupervised domain adaptation. A higher priority is given to examples that are easy to transfer to the new subject, and the algorithm could be personalized for new subjects without re-training. Experimental results show a state-of-the-art overall accuracy of 83.8% with only 45.76 Million Floating-point Operations per Second (MFLOPs) computation and 43.08 K parameters.
Abstract:This paper proposed a Multi-Channel Multi-Domain (MCMD) based knowledge distillation algorithm for sleep staging using single-channel EEG. Both knowledge from different domains and different channels are learnt in the proposed algorithm, simultaneously. A multi-channel pre-training and single-channel fine-tuning scheme is used in the proposed work. The knowledge from different channels in the source domain is transferred to the single-channel model in the target domain. A pre-trained teacher-student model scheme is used to distill knowledge from the multi-channel teacher model to the single-channel student model combining with output transfer and intermediate feature transfer in the target domain. The proposed algorithm achieves a state-of-the-art single-channel sleep staging accuracy of 86.5%, with only 0.6% deterioration from the state-of-the-art multi-channel model. There is an improvement of 2% compared to the baseline model. The experimental results show that knowledge from multiple domains (different datasets) and multiple channels (e.g. EMG, EOG) could be transferred to single-channel sleep staging.
Abstract:Accurate segmentation of clustered microcalcifications in mammography is crucial for the diagnosis and treatment of breast cancer. Despite exhibiting expert-level accuracy, recent deep learning advancements in medical image segmentation provide insufficient contribution to practical applications, due to the domain shift resulting from differences in patient postures, individual gland density, and imaging modalities of mammography etc. In this paper, a novel framework named MLN-net, which can accurately segment multi-source images using only single source images, is proposed for clustered microcalcification segmentation. We first propose a source domain image augmentation method to generate multi-source images, leading to improved generalization. And a structure of multiple layer normalization (LN) layers is used to construct the segmentation network, which can be found efficient for clustered microcalcification segmentation in different domains. Additionally, a branch selection strategy is designed for measuring the similarity of the source domain data and the target domain data. To validate the proposed MLN-net, extensive analyses including ablation experiments are performed, comparison of 12 baseline methods. Extensive experiments validate the effectiveness of MLN-net in segmenting clustered microcalcifications from different domains and the its segmentation accuracy surpasses state-of-the-art methods. Code will be available at https://github.com/yezanting/MLN-NET-VERSON1.
Abstract:The quality of images captured by wireless capsule endoscopy (WCE) is key for doctors to diagnose diseases of gastrointestinal (GI) tract. However, there exist many low-quality endoscopic images due to the limited illumination and complex environment in GI tract. After an enhancement process, the severe noise become an unacceptable problem. The noise varies with different cameras, GI tract environments and image enhancement. And the noise model is hard to be obtained. This paper proposes a convolutional blind denoising network for endoscopic images. We apply Deep Image Prior (DIP) method to reconstruct a clean image iteratively using a noisy image without a specific noise model and ground truth. Then we design a blind image quality assessment network based on MobileNet to estimate the quality of the reconstructed images. The estimated quality is used to stop the iterative operation in DIP method. The number of iterations is reduced about 36% by using transfer learning in our DIP process. Experimental results on endoscopic images and real-world noisy images demonstrate the superiority of our proposed method over the state-of-the-art methods in terms of visual quality and quantitative metrics.
Abstract:We propose an algorithm that is capable of synthesizing high quality target speaker's singing voice given only their normal speech samples. The proposed algorithm first integrate speech and singing synthesis into a unified framework, and learns universal speaker embeddings that are shareable between speech and singing synthesis tasks. Specifically, the speaker embeddings learned from normal speech via the speech synthesis objective are shared with those learned from singing samples via the singing synthesis objective in the unified training framework. This makes the learned speaker embedding a transferable representation for both speaking and singing. We evaluate the proposed algorithm on singing voice conversion task where the content of original singing is covered with the timbre of another speaker's voice learned purely from their normal speech samples. Our experiments indicate that the proposed algorithm generates high-quality singing voices that sound highly similar to target speaker's voice given only his or her normal speech samples. We believe that proposed algorithm will open up new opportunities for singing synthesis and conversion for broader users and applications.
Abstract:Examining and interpreting of a large number of wireless endoscopic images from the gastrointestinal tract is a tiresome task for physicians. A practical solution is to automatically construct a two dimensional representation of the gastrointestinal tract for easy inspection. However, little has been done on wireless endoscopic image stitching, let alone systematic investigation. The proposed new wireless endoscopic image stitching method consists of two main steps to improve the accuracy and efficiency of image registration. First, the keypoints are extracted by Principle Component Analysis and Scale Invariant Feature Transform (PCA-SIFT) algorithm and refined with Maximum Likelihood Estimation SAmple Consensus (MLESAC) outlier removal to find the most reliable keypoints. Second, the optimal transformation parameters obtained from first step are fed to the Normalised Mutual Information (NMI) algorithm as an initial solution. With modified Marquardt-Levenberg search strategy in a multiscale framework, the NMI can find the optimal transformation parameters in the shortest time. The proposed methodology has been tested on two different datasets - one with real wireless endoscopic images and another with images obtained from Micro-Ball (a new wireless cubic endoscopy system with six image sensors). The results have demonstrated the accuracy and robustness of the proposed methodology both visually and quantitatively.