Member, IEEE
Abstract:Recently, Neural architecture search has achieved great success on classification tasks for mobile devices. The backbone network for object detection is usually obtained on the image classification task. However, the architecture which is searched through the classification task is sub-optimal because of the gap between the task of image and object detection. As while work focuses on backbone network architecture search for mobile device object detection is limited, mainly because the backbone always requires expensive ImageNet pre-training. Accordingly, it is necessary to study the approach of network architecture search for mobile device object detection without expensive pre-training. In this work, we propose a mobile object detection backbone network architecture search algorithm which is a kind of evolutionary optimized method based on non-dominated sorting for NAS scenarios. It can quickly search to obtain the backbone network architecture within certain constraints. It better solves the problem of suboptimal linear combination accuracy and computational cost. The proposed approach can search the backbone networks with different depths, widths, or expansion sizes via a technique of weight mapping, making it possible to use NAS for mobile devices detection tasks a lot more efficiently. In our experiments, we verify the effectiveness of the proposed approach on YoloX-Lite, a lightweight version of the target detection framework. Under similar computational complexity, the accuracy of the backbone network architecture we search for is 2.0% mAP higher than MobileDet. Our improved backbone network can reduce the computational effort while improving the accuracy of the object detection network. To prove its effectiveness, a series of ablation studies have been carried out and the working mechanism has been analyzed in detail.
Abstract:To address the trade-off problem of quality-diversity for the generated images in imbalanced classification tasks, we research on over-sampling based methods at the feature level instead of the data level and focus on searching the latent feature space for optimal distributions. On this basis, we propose an iMproved Estimation Distribution Algorithm based Latent featUre Distribution Evolution (MEDA_LUDE) algorithm, where a joint learning procedure is programmed to make the latent features both optimized and evolved by the deep neural networks and the evolutionary algorithm, respectively. We explore the effect of the Large-margin Gaussian Mixture (L-GM) loss function on distribution learning and design a specialized fitness function based on the similarities among samples to increase diversity. Extensive experiments on benchmark based imbalanced datasets validate the effectiveness of our proposed algorithm, which can generate images with both quality and diversity. Furthermore, the MEDA_LUDE algorithm is also applied to the industrial field and successfully alleviates the imbalanced issue in fabric defect classification.
Abstract:Transformers exhibit great advantages in handling computer vision tasks. They model image classification tasks by utilizing a multi-head attention mechanism to process a series of patches consisting of split images. However, for complex tasks, Transformer in computer vision not only requires inheriting a bit of dynamic attention and global context, but also needs to introduce features concerning noise reduction, shifting, and scaling invariance of objects. Therefore, here we take a step forward to study the structural characteristics of Transformer and convolution and propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS). The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture while maintaining the benefits of the multi-head attention mechanism. The searched block-based backbone network can extract feature maps at different scales. These features are compatible with a wider range of visual tasks, such as image classification (32 M parameters, 82.0% Top-1 accuracy on ImageNet-1K) and object detection (50.4% mAP on COCO2017). The proposed topology based on the multi-head attention mechanism and CNN adaptively associates relational features of pixels with multi-scale features of objects. It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
Abstract:Visual sensation and perception refers to the process of sensing, organizing, identifying, and interpreting visual information in environmental awareness and understanding. Computational models inspired by visual perception have the characteristics of complexity and diversity, as they come from many subjects such as cognition science, information science, and artificial intelligence. In this paper, visual perception computational models oriented deep learning are investigated from the biological visual mechanism and computational vision theory systematically. Then, some points of view about the prospects of the visual perception computational models are presented. Finally, this paper also summarizes the current challenges of visual perception and predicts its future development trends. Through this survey, it will provide a comprehensive reference for research in this direction.
Abstract:In recent years, neural architecture search (NAS) methods have been proposed for the automatic generation of task-oriented network architecture in image classification. However, the architectures obtained by existing NAS approaches are optimized only for classification performance and do not adapt to devices with limited computational resources. To address this challenge, we propose a neural network architecture search algorithm aiming to simultaneously improve network performance (e.g., classification accuracy) and reduce network complexity. The proposed framework automatically builds the network architecture at two stages: block-level search and network-level search. At the stage of block-level search, a relaxation method based on the gradient is proposed, using an enhanced gradient to design high-performance and low-complexity blocks. At the stage of network-level search, we apply an evolutionary multi-objective algorithm to complete the automatic design from blocks to the target network. The experiment results demonstrate that our method outperforms all evaluated hand-crafted networks in image classification, with an error rate of on CIFAR10 and an error rate of on CIFAR100, both at network parameter size less than one megabit. Moreover, compared with other neural architecture search methods, our method offers a tremendous reduction in designed network architecture parameters.
Abstract:For the sake of recognizing and classifying textile defects, deep learning-based methods have been proposed and achieved remarkable success in single-label textile images. However, detecting multi-label defects in a textile image remains challenging due to the coexistence of multiple defects and small-size defects. To address these challenges, a multi-level, multi-attentional deep learning network (MLMA-Net) is proposed and built to 1) increase the feature representation ability to detect small-size defects; 2) generate a discriminative representation that maximizes the capability of attending the defect status, which leverages higher-resolution feature maps for multiple defects. Moreover, a multi-label object detection dataset (DHU-ML1000) in textile defect images is built to verify the performance of the proposed model. The results demonstrate that the network extracts more distinctive features and has better performance than the state-of-the-art approaches on the real-world industrial dataset.
Abstract:Relation classification (RC) task is one of fundamental tasks of information extraction, aiming to detect the relation information between entity pairs in unstructured natural language text and generate structured data in the form of entity-relation triple. Although distant supervision methods can effectively alleviate the problem of lack of training data in supervised learning, they also introduce noise into the data, and still cannot fundamentally solve the long-tail distribution problem of the training instances. In order to enable the neural network to learn new knowledge through few instances like humans, this work focuses on few-shot relation classification (FSRC), where a classifier should generalize to new classes that have not been seen in the training set, given only a number of samples for each class. To make full use of the existing information and get a better feature representation for each instance, we propose to encode each class prototype in an adaptive way from two aspects. First, based on the prototypical networks, we propose an adaptive mixture mechanism to add label words to the representation of the class prototype, which, to the best of our knowledge, is the first attempt to integrate the label information into features of the support samples of each class so as to get more interactive class prototypes. Second, to more reasonably measure the distances between samples of each category, we introduce a loss function for joint representation learning to encode each support instance in an adaptive manner. Extensive experiments have been conducted on FewRel under different few-shot (FS) settings, and the results show that the proposed adaptive prototypical networks with label words and joint representation learning has not only achieved significant improvements in accuracy, but also increased the generalization ability of few-shot RC models.
Abstract:Deep neural networks (DNNs) have achieved remarkable success in computer vision; however, training DNNs for satisfactory performance remains challenging and suffers from sensitivity to empirical selections of an optimization algorithm for training. Stochastic gradient descent (SGD) is dominant in training a DNN by adjusting neural network weights to minimize the DNNs loss function. As an alternative approach, neuroevolution is more in line with an evolutionary process and provides some key capabilities that are often unavailable in SGD, such as the heuristic black-box search strategy based on individual collaboration in neuroevolution. This paper proposes a novel approach that combines the merits of both neuroevolution and SGD, enabling evolutionary search, parallel exploration, and an effective probe for optimal DNNs. A hierarchical cluster-based suppression algorithm is also developed to overcome similar weight updates among individuals for improving population diversity. We implement the proposed approach in four representative DNNs based on four publicly-available datasets. Experiment results demonstrate that the four DNNs optimized by the proposed approach all outperform corresponding ones optimized by only SGD on all datasets. The performance of DNNs optimized by the proposed approach also outperforms state-of-the-art deep networks. This work also presents a meaningful attempt for pursuing artificial general intelligence.
Abstract:With an exponential explosive growth of various digital text information, it is challenging to efficiently obtain specific knowledge from massive unstructured text information. As one basic task for natural language processing (NLP), relation extraction aims to extract the semantic relation between entity pairs based on the given text. To avoid manual labeling of datasets, distant supervision relation extraction (DSRE) has been widely used, aiming to utilize knowledge base to automatically annotate datasets. Unfortunately, this method heavily suffers from wrong labelling due to the underlying strong assumptions. To address this issue, we propose a new framework using hybrid attention-based Transformer block with multi-instance learning to perform the DSRE task. More specifically, the Transformer block is firstly used as the sentence encoder to capture syntactic information of sentences, which mainly utilizes multi-head self-attention to extract features from word level. Then, a more concise sentence-level attention mechanism is adopted to constitute the bag representation, aiming to incorporate valid information of each sentence to effectively represent the bag. Experimental results on the public dataset New York Times (NYT) demonstrate that the proposed approach can outperform the state-of-the-art algorithms on the evaluation dataset, which verifies the effectiveness of our model for the DSRE task.
Abstract:The performance of a deep neural network is heavily dependent on its architecture and various neural architecture search strategies have been developed for automated network architecture design. Recently, evolutionary neural architecture search (ENAS) has received increasing attention due to the attractive global optimization capability of evolutionary algorithms. However, ENAS suffers from extremely high computation costs because a large number of performance evaluations is usually required in evolutionary optimization and training deep neural networks is itself computationally very intensive. To address this issue, this paper proposes a new evolutionary framework for fast ENAS based on directed acyclic graph, in which parents are randomly sampled and trained on each mini-batch of training data. In addition, a node inheritance strategy is adopted to generate offspring individuals and their fitness is directly evaluated without training. To enhance the feature processing capability of the evolved neural networks, we also encode a channel attention mechanism in the search space. We evaluate the proposed algorithm on the widely used datasets, in comparison with 26 state-of-the-art peer algorithms. Our experimental results show the proposed algorithm is not only computationally much more efficiently, but also highly competitive in learning performance.