Abstract:This paper studies the Binary Neural Networks (BNNs) in which weights and activations are both binarized into 1-bit values, thus greatly reducing the memory usage and computational complexity. Since the modern deep neural networks are of sophisticated design with complex architecture for the accuracy reason, the diversity on distributions of weights and activations is very high. Therefore, the conventional sign function cannot be well used for effectively binarizing full-precision values in BNNs. To this end, we present a simple yet effective approach called AdaBin to adaptively obtain the optimal binary sets $\{b_1, b_2\}$ ($b_1, b_2\in \mathbb{R}$) of weights and activations for each layer instead of a fixed set (i.e., $\{-1, +1\}$). In this way, the proposed method can better fit different distributions and increase the representation ability of binarized features. In practice, we use the center position and distance of 1-bit values to define a new binary quantization function. For the weights, we propose an equalization method to align the symmetrical center of binary distribution to real-valued distribution, and minimize the Kullback-Leibler divergence of them. Meanwhile, we introduce a gradient-based optimization method to get these two parameters for activations, which are jointly trained in an end-to-end manner. Experimental results on benchmark models and datasets demonstrate that the proposed AdaBin is able to achieve state-of-the-art performance. For instance, we obtain a 66.4\% Top-1 accuracy on the ImageNet using ResNet-18 architecture, and a 69.4 mAP on PASCAL VOC using SSD300.
Abstract:A novel model called error loss network (ELN) is proposed to build an error loss function for supervised learning. The ELN is in structure similar to a radial basis function (RBF) neural network, but its input is an error sample and output is a loss corresponding to that error sample. That means the nonlinear input-output mapper of ELN creates an error loss function. The proposed ELN provides a unified model for a large class of error loss functions, which includes some information theoretic learning (ITL) loss functions as special cases. The activation function, weight parameters and network size of the ELN can be predetermined or learned from the error samples. On this basis, we propose a new machine learning paradigm where the learning process is divided into two stages: first, learning a loss function using an ELN; second, using the learned loss function to continue to perform the learning. Experimental results are presented to demonstrate the desirable performance of the new method.
Abstract:The key idea of the state-of-the-art VAE-based unsupervised representation disentanglement methods is to minimize the total correlation of the latent variable distributions. However, it has been proved that VAE-based unsupervised disentanglement can not be achieved without introducing other inductive bias. In this paper, we address VAE-based unsupervised disentanglement by leveraging the constraints derived from the Group Theory based definition as the non-probabilistic inductive bias. More specifically, inspired by the nth dihedral group (the permutation group for regular polygons), we propose a specific form of the definition and prove its two equivalent conditions: isomorphism and "the constancy of permutations". We further provide an implementation of isomorphism based on two Group constraints: the Abel constraint for the exchangeability and Order constraint for the cyclicity. We then convert them into a self-supervised training loss that can be incorporated into VAE-based models to bridge their gaps from the Group Theory based definition. We train 1800 models covering the most prominent VAE-based models on five datasets to verify the effectiveness of our method. Compared to the original models, the Groupidied VAEs consistently achieve better mean performance with smaller variances, and make meaningful dimensions controllable.
Abstract:Keypoint-based detectors have achieved pretty-well performance. However, incorrect keypoint matching is still widespread and greatly affects the performance of the detector. In this paper, we propose CentripetalNet which uses centripetal shift to pair corner keypoints from the same instance. CentripetalNet predicts the position and the centripetal shift of the corner points and matches corners whose shifted results are aligned. Combining position information, our approach matches corner points more accurately than the conventional embedding approaches do. Corner pooling extracts information inside the bounding boxes onto the border. To make this information more aware at the corners, we design a cross-star deformable convolution network to conduct feature adaption. Furthermore, we explore instance segmentation on anchor-free detectors by equipping our CentripetalNet with a mask prediction module. On MS-COCO test-dev, our CentripetalNet not only outperforms all existing anchor-free detectors with an AP of 48.0% but also achieves comparable performance to the state-of-the-art instance segmentation approaches with a 40.2% MaskAP. Code will be available at https://github.com/KiveeDong/CentripetalNet.
Abstract:Semantic SLAM is an important field in autonomous driving and intelligent agents, which can enable robots to achieve high-level navigation tasks, obtain simple cognition or reasoning ability and achieve language-based human-robot-interaction. In this paper, we built a system to creat a semantic 3D map by combining 3D point cloud from ORB SLAM with semantic segmentation information from Convolutional Neural Network model PSPNet-101 for large-scale environments. Besides, a new dataset for KITTI sequences has been built, which contains the GPS information and labels of landmarks from Google Map in related streets of the sequences. Moreover, we find a way to associate the real-world landmark with point cloud map and built a topological map based on semantic map.
Abstract:In recent years, correntropy has been seccessfully applied to robust adaptive filtering to eliminate adverse effects of impulsive noises or outliers. Correntropy is generally defined as the expectation of a Gaussian kernel between two random variables. This definition is reasonable when the error between the two random variables is symmetrically distributed around zero. For the case of asymmetric error distribution, the symmetric Gaussian kernel is however inappropriate and cannot adapt to the error distribution well. To address this problem, in this letter we propose a new variant of correntropy, named asymmetric correntropy, which uses an asymmetric Gaussian model as the kernel function. In addition, a robust adaptive filtering algorithm based on asymmetric correntropy is developed and its steadystate convergence performance is analyzed. Simulations are provided to confirm the theoretical results and good performance of the proposed algorithm.
Abstract:As a novel similarity measure that is defined as the expectation of a kernel function between two random variables, correntropy has been successfully applied in robust machine learning and signal processing to combat large outliers. The kernel function in correntropy is usually a zero-mean Gaussian kernel. In a recent work, the concept of mixture correntropy (MC) was proposed to improve the learning performance, where the kernel function is a mixture Gaussian kernel, namely a linear combination of several zero-mean Gaussian kernels with different widths. In both correntropy and mixture correntropy, the center of the kernel function is, however, always located at zero. In the present work, to further improve the learning performance, we propose the concept of multi-kernel correntropy (MKC), in which each component of the mixture Gaussian kernel can be centered at a different location. The properties of the MKC are investigated and an efficient approach is proposed to determine the free parameters in MKC. Experimental results show that the learning algorithms under the maximum multi-kernel correntropy criterion (MMKCC) can outperform those under the original maximum correntropy criterion (MCC) and the maximum mixture correntropy criterion (MMCC).
Abstract:The widespread application of artificial neural networks has prompted researchers to experiment with FPGA and customized ASIC designs to speed up their computation. These implementation efforts have generally focused on weight multiplication and signal summation operations, and less on activation functions used in these applications. Yet, efficient hardware implementations of nonlinear activation functions like Exponential Linear Units (ELU), Scaled Exponential Linear Units (SELU), and Hyperbolic Tangent (tanh), are central to designing effective neural network accelerators, since these functions require lots of resources. In this paper, we explore efficient hardware implementations of activation functions using purely combinational circuits, with a focus on two widely used nonlinear activation functions, i.e., SELU and tanh. Our experiments demonstrate that neural networks are generally insensitive to the precision of the activation function. The results also prove that the proposed combinational circuit-based approach is very efficient in terms of speed and area, with negligible accuracy loss on the MNIST, CIFAR-10 and IMAGENET benchmarks. Synopsys Design Compiler synthesis results show that circuit designs for tanh and SELU can save between 3.13-7.69 and 4.45-8:45 area compared to the LUT/memory-based implementations, and can operate at 5.14GHz and 4.52GHz using the 28nm SVT library, respectively. The implementation is available at: https://github.com/ThomasMrY/ActivationFunctionDemo.