Abstract:Speaker separation aims to extract multiple voices from a mixed signal. In this paper, we propose two speaker-aware designs to improve the existing speaker separation solutions. The first model is a speaker conditioning network that integrates speech samples to generate individualized speaker conditions, which then provide informed guidance for a separation module to produce well-separated outputs. The second design aims to reduce non-target voices in the separated speech. To this end, we propose negative distances to penalize the appearance of any non-target voice in the channel outputs, and positive distances to drive the separated voices closer to the clean targets. We explore two different setups, weighted-sum and triplet-like, to integrate these two distances to form a combined auxiliary loss for the separation networks. Experiments conducted on LibriMix demonstrate the effectiveness of our proposed models.
Abstract:In this paper, we explore the capabilities of a number of deep neural network models in generating whole-brain 3T-like MR images from clinical 1.5T MRIs. The models include a fully convolutional network (FCN) method and three state-of-the-art super-resolution solutions, ESPCN [26], SRGAN [17] and PRSR [7]. The FCN solution, U-Convert-Net, carries out mapping of 1.5T-to-3T slices through a U-Net-like architecture, with 3D neighborhood information integrated through a multi-view ensemble. The pros and cons of the models, as well the associated evaluation metrics, are measured with experiments and discussed in depth. To the best of our knowledge, this study is the first work to evaluate multiple deep learning solutions for whole-brain MRI conversion, as well as the first attempt to utilize FCN/U-Net-like structure for this purpose.
Abstract:In this paper, we propose a pyramid network structure to improve the FCN-based segmentation solutions and apply it to label thyroid follicles in histology images. Our design is based on the notion that a hierarchical updating scheme, if properly implemented, can help FCNs capture the major objects, as well as structure details in an image. To this end, we devise a residual module to be mounted on consecutive network layers, through which pixel labels would be propagated from the coarsest layer towards the finest layer in a bottom-up fashion. We add five residual units along the decoding path of a modified U-Net to make our segmentation network, Res-Seg-Net. Experiments demonstrate that the multi-resolution set-up in our model is effective in producing segmentations with improved accuracy and robustness.
Abstract:In this paper, we propose a capsule-based neural network model to solve the semantic segmentation problem. By taking advantage of the extractable part-whole dependencies available in capsule layers, we derive the probabilities of the class labels for individual capsules through a recursive, layer-by-layer procedure. We model this procedure as a traceback pipeline and take it as a central piece to build an end-to-end segmentation network. Under the proposed framework, image-level class labels and object boundaries are jointly sought in an explicit manner, which poses a significant advantage over the state-of-the-art fully convolutional network (FCN) solutions. Experiments conducted on modified MNIST and neuroimages demonstrate that our model considerably enhance the segmentation performance compared to the leading FCN variant.
Abstract:In this paper, we develop a two-stage neural network solution for the challenging task of white-matter lesion segmentation. To cope with the vast vari- ability in lesion sizes, we sample brain MR scans with patches at three differ- ent dimensions and feed them into separate fully convolutional neural networks (FCNs). In the second stage, we process large and small lesion separately, and use ensemble-nets to combine the segmentation results generated from the FCNs. A novel activation function is adopted in the ensemble-nets to improve the segmen- tation accuracy measured by Dice Similarity Coefficient. Experiments on MICCAI 2017 White Matter Hyperintensities (WMH) Segmentation Challenge data demonstrate that our two-stage-multi-sized FCN approach, as well as the new activation function, are effective in capturing white-matter lesions in MR images.
Abstract:In this paper, we propose a nonlinear distance metric learning scheme based on the fusion of component linear metrics. Instead of merging displacements at each data point, our model calculates the velocities induced by the component transformations, via a geodesic interpolation on a Lie transfor- mation group. Such velocities are later summed up to produce a global transformation that is guaranteed to be diffeomorphic. Consequently, pair-wise distances computed this way conform to a smooth and spatially varying metric, which can greatly benefit k-NN classification. Experiments on synthetic and real datasets demonstrate the effectiveness of our model.