Abstract:The Unbalanced Optimal Transport (UOT) problem plays increasingly important roles in computational biology, computational imaging and deep learning. Scaling algorithm is widely used to solve UOT due to its convenience and good convergence properties. However, this algorithm has lower accuracy for large regularization parameters, and due to stability issues, small regularization parameters can easily lead to numerical overflow. We address this challenge by developing an inexact Bregman proximal point method for solving UOT. This algorithm approximates the proximal operator using the Scaling algorithm at each iteration. The algorithm (1) converges to the true solution of UOT, (2) has theoretical guarantees and robust regularization parameter selection, (3) mitigates numerical stability issues, and (4) can achieve comparable computational complexity to the Scaling algorithm in specific practice. Building upon this, we develop an accelerated version of inexact Bregman proximal point method for solving UOT by using acceleration techniques of Bregman proximal point method and provide theoretical guarantees and experimental validation of convergence and acceleration.
Abstract:Despite the promise of superior performance under challenging conditions, event-based motion estimation remains a hard problem owing to the difficulty of extracting and tracking stable features from event streams. In order to robustify the estimation, it is generally believed that fusion with other sensors is a requirement. In this work, we demonstrate reliable, purely event-based visual odometry on planar ground vehicles by employing the constrained non-holonomic motion model of Ackermann steering platforms. We extend single feature n-linearities for regular frame-based cameras to the case of quasi time-continuous event-tracks, and achieve a polynomial form via variable degree Taylor expansions. Robust averaging over multiple event tracks is simply achieved via histogram voting. As demonstrated on both simulated and real data, our algorithm achieves accurate and robust estimates of the vehicle's instantaneous rotational velocity, and thus results that are comparable to the delta rotations obtained by frame-based sensors under normal conditions. We furthermore significantly outperform the more traditional alternatives in challenging illumination scenarios. The code is available at \url{https://github.com/gowanting/NHEVO}.
Abstract:Image segmentation is a crucial but challenging task that has many applications. In medical imaging for instance, intensity inhomogeneity and noise are common. In thigh muscle images, different muscles are closed packed together and there are often no clear boundaries between them. Intensity based segmentation models cannot separate one muscle from another. To solve such problems, in this work we present a segmentation model with adaptive spatial priors from joint registration. This model combines segmentation and registration in a unified framework to leverage their positive mutual influence. The segmentation is based on a modified Gaussian mixture model (GMM), which integrates intensity inhomogeneity and spacial smoothness. The registration plays the role of providing a shape prior. We adopt a modified sum of squared difference (SSD) fidelity term and Tikhonov regularity term for registration, and also utilize Gaussian pyramid and parametric method for robustness. The connection between segmentation and registration is guaranteed by the cross entropy metric that aims to make the segmentation map (from segmentation) and deformed atlas (from registration) as similar as possible. This joint framework is implemented within a constraint optimization framework, which leads to an efficient algorithm. We evaluate our proposed model on synthetic and thigh muscle MR images. Numerical results show the improvement as compared to segmentation and registration performed separately and other joint models.
Abstract:Deep learning has significantly improved the precision of instance segmentation with abundant labeled data. However, in many areas like medical and manufacturing, collecting sufficient data is extremely hard and labeling this data requires high professional skills. We follow this motivation and propose a new task set named zero-shot instance segmentation (ZSI). In the training phase of ZSI, the model is trained with seen data, while in the testing phase, it is used to segment all seen and unseen instances. We first formulate the ZSI task and propose a method to tackle the challenge, which consists of Zero-shot Detector, Semantic Mask Head, Background Aware RPN and Synchronized Background Strategy. We present a new benchmark for zero-shot instance segmentation based on the MS-COCO dataset. The extensive empirical results in this benchmark show that our method not only surpasses the state-of-the-art results in zero-shot object detection task but also achieves promising performance on ZSI. Our approach will serve as a solid baseline and facilitate future research in zero-shot instance segmentation.
Abstract:Zero-shot detection (ZSD) is crucial to large-scale object detection with the aim of simultaneously localizing and recognizing unseen objects. There remain several challenges for ZSD, including reducing the ambiguity between background and unseen objects as well as improving the alignment between visual and semantic concept. In this work, we propose a novel framework named Background Learnable Cascade (BLC) to improve ZSD performance. The major contributions for BLC are as follows: (i) we propose a multi-stage cascade structure named Cascade Semantic R-CNN to progressively refine the alignment between visual and semantic of ZSD; (ii) we develop the semantic information flow structure and directly add it between each stage in Cascade Semantic RCNN to further improve the semantic feature learning; (iii) we propose the background learnable region proposal network (BLRPN) to learn an appropriate word vector for background class and use this learned vector in Cascade Semantic R CNN, this design makes \Background Learnable" and reduces the confusion between background and unseen classes. Our extensive experiments show BLC obtains significantly performance improvements for MS-COCO over state-of-the-art methods.
Abstract:Image segmentation with a volume constraint is an important prior for many real applications. In this work, we present a novel volume preserving image segmentation algorithm, which is based on the framework of entropic regularized optimal transport theory. The classical Total Variation (TV) regularizer and volume preserving are integrated into a regularized optimal transport model, and the volume and classification constraints can be regarded as two measures preserving constraints in the optimal transport problem. By studying the dual problem, we develop a simple and efficient dual algorithm for our model. Moreover, to be different from many variational based image segmentation algorithms, the proposed algorithm can be directly unrolled to a new Volume Preserving and TV regularized softmax (VPTV-softmax) layer for semantic segmentation in the popular Deep Convolution Neural Network (DCNN). The experiment results show that our proposed model is very competitive and can improve the performance of many semantic segmentation nets such as the popular U-net.
Abstract:The precise diagnosis is of great significance in developing precise treatment plans to restore neck function and reduce the burden posed by the cervical spondylosis (CS). However, the current available neck function assessment method are subjective and coarse-grained. In this paper, based on the relationship among CS, cervical structure, cervical vertebra function, and surface electromyography (sEMG), we seek to develop a clustering algorithms on the sEMG data set collected from the clinical environment and implement the division. We proposed and developed the framework EasiCS, which consists of dimension reduction, clustering algorithm EasiSOM, spectral clustering algorithm EasiSC. The EasiCS outperform the commonly used seven algorithms overall.
Abstract:Cervical spondylosis (CS) is a common chronic disease that affects up to two-thirds of the population and poses a serious burden on individuals and society. The early identification has significant value in improving cure rate and reducing costs. However, the pathology is complex, and the mild symptoms increase the difficulty of the diagnosis, especially in the early stage. Besides, the time-consuming and costliness of hospital medical service reduces the attention to the CS identification. Thus, a convenient, low-cost intelligent CS identification method is imperious demanded. In this paper, we present an intelligent method based on the deep learning to identify CS, using the surface electromyography (sEMG) signal. Faced with the complex, high dimensionality and weak usability of the sEMG signal, we proposed and developed a multi-channel EasiCSDeep algorithm based on the convolutional neural network, which consists of the feature extraction, spatial relationship representation and classification algorithm. To the best of our knowledge, this EasiCSDeep is the first effort to employ the deep learning and the sEMG data to identify CS. Compared with previous state-of-the-art algorithm, our algorithm achieves a significant improvement.
Abstract:In this work, we show the intrinsic relations between optimal transportation and convex geometry, especially the variational approach to solve Alexandrov problem: constructing a convex polytope with prescribed face normals and volumes. This leads to a geometric interpretation to generative models, and leads to a novel framework for generative models. By using the optimal transportation view of GAN model, we show that the discriminator computes the Kantorovich potential, the generator calculates the transportation map. For a large class of transportation costs, the Kantorovich potential can give the optimal transportation map by a close-form formula. Therefore, it is sufficient to solely optimize the discriminator. This shows the adversarial competition can be avoided, and the computational architecture can be simplified. Preliminary experimental results show the geometric method outperforms WGAN for approximating probability measures with multiple clusters in low dimensional space.