Abstract:The field of machine learning has been greatly transformed with the advancement of deep artificial neural networks (ANNs) and the increased availability of annotated data. Spiking neural networks (SNNs) have recently emerged as a low-power alternative to ANNs due to their sparsity nature. In this work, we propose a novel hybrid ANN-SNN co-training framework to improve the performance of converted SNNs. Our approach is a fine-tuning scheme, conducted through an alternating, forward-backward training procedure. We apply our framework to object detection and image segmentation tasks. Experiments demonstrate the effectiveness of our approach in achieving the design goals.
Abstract:Over the past decade, artificial neural networks (ANNs) have made tremendous advances, in part due to the increased availability of annotated data. However, ANNs typically require significant power and memory consumptions to reach their full potential. Spiking neural networks (SNNs) have recently emerged as a low-power alternative to ANNs due to their sparsity nature. SNN, however, are not as easy to train as ANNs. In this work, we propose a hybrid SNN training scheme and apply it to segment human hippocampi from magnetic resonance images. Our approach takes ANN-SNN conversion as an initialization step and relies on spike-based backpropagation to fine-tune the network. Compared with the conversion and direct training solutions, our method has advantages in both segmentation accuracy and training efficiency. Experiments demonstrate the effectiveness of our model in achieving the design goals.
Abstract:Speaker separation aims to extract multiple voices from a mixed signal. In this paper, we propose two speaker-aware designs to improve the existing speaker separation solutions. The first model is a speaker conditioning network that integrates speech samples to generate individualized speaker conditions, which then provide informed guidance for a separation module to produce well-separated outputs. The second design aims to reduce non-target voices in the separated speech. To this end, we propose negative distances to penalize the appearance of any non-target voice in the channel outputs, and positive distances to drive the separated voices closer to the clean targets. We explore two different setups, weighted-sum and triplet-like, to integrate these two distances to form a combined auxiliary loss for the separation networks. Experiments conducted on LibriMix demonstrate the effectiveness of our proposed models.
Abstract:In this paper, we explore the capabilities of a number of deep neural network models in generating whole-brain 3T-like MR images from clinical 1.5T MRIs. The models include a fully convolutional network (FCN) method and three state-of-the-art super-resolution solutions, ESPCN [26], SRGAN [17] and PRSR [7]. The FCN solution, U-Convert-Net, carries out mapping of 1.5T-to-3T slices through a U-Net-like architecture, with 3D neighborhood information integrated through a multi-view ensemble. The pros and cons of the models, as well the associated evaluation metrics, are measured with experiments and discussed in depth. To the best of our knowledge, this study is the first work to evaluate multiple deep learning solutions for whole-brain MRI conversion, as well as the first attempt to utilize FCN/U-Net-like structure for this purpose.
Abstract:In this paper, we propose a pyramid network structure to improve the FCN-based segmentation solutions and apply it to label thyroid follicles in histology images. Our design is based on the notion that a hierarchical updating scheme, if properly implemented, can help FCNs capture the major objects, as well as structure details in an image. To this end, we devise a residual module to be mounted on consecutive network layers, through which pixel labels would be propagated from the coarsest layer towards the finest layer in a bottom-up fashion. We add five residual units along the decoding path of a modified U-Net to make our segmentation network, Res-Seg-Net. Experiments demonstrate that the multi-resolution set-up in our model is effective in producing segmentations with improved accuracy and robustness.
Abstract:In this paper, we develop a two-stage neural network solution for the challenging task of white-matter lesion segmentation. To cope with the vast vari- ability in lesion sizes, we sample brain MR scans with patches at three differ- ent dimensions and feed them into separate fully convolutional neural networks (FCNs). In the second stage, we process large and small lesion separately, and use ensemble-nets to combine the segmentation results generated from the FCNs. A novel activation function is adopted in the ensemble-nets to improve the segmen- tation accuracy measured by Dice Similarity Coefficient. Experiments on MICCAI 2017 White Matter Hyperintensities (WMH) Segmentation Challenge data demonstrate that our two-stage-multi-sized FCN approach, as well as the new activation function, are effective in capturing white-matter lesions in MR images.
Abstract:In this paper, we propose a nonlinear distance metric learning scheme based on the fusion of component linear metrics. Instead of merging displacements at each data point, our model calculates the velocities induced by the component transformations, via a geodesic interpolation on a Lie transfor- mation group. Such velocities are later summed up to produce a global transformation that is guaranteed to be diffeomorphic. Consequently, pair-wise distances computed this way conform to a smooth and spatially varying metric, which can greatly benefit k-NN classification. Experiments on synthetic and real datasets demonstrate the effectiveness of our model.