Member, IEEE
Abstract:Meta-learning methods typically follow a two-loop framework, where each loop potentially suffers from notorious overfitting, hindering rapid adaptation and generalization to new tasks. Existing schemes solve it by enhancing the mutual-exclusivity or diversity of training samples, but these data manipulation strategies are data-dependent and insufficiently flexible. This work alleviates overfitting in meta-learning from the perspective of gradient regularization and proposes a data-independent \textbf{M}eta-\textbf{G}radient \textbf{Aug}mentation (\textbf{MGAug}) method. The key idea is to first break the rote memories by network pruning to address memorization overfitting in the inner loop, and then the gradients of pruned sub-networks naturally form the high-quality augmentation of the meta-gradient to alleviate learner overfitting in the outer loop. Specifically, we explore three pruning strategies, including \textit{random width pruning}, \textit{random parameter pruning}, and a newly proposed \textit{catfish pruning} that measures a Meta-Memorization Carrying Amount (MMCA) score for each parameter and prunes high-score ones to break rote memories as much as possible. The proposed MGAug is theoretically guaranteed by the generalization bound from the PAC-Bayes framework. In addition, we extend a lightweight version, called MGAug-MaxUp, as a trade-off between performance gains and resource overhead. Extensive experiments on multiple few-shot learning benchmarks validate MGAug's effectiveness and significant improvement over various meta-baselines. The code is publicly available at \url{https://github.com/xxLifeLover/Meta-Gradient-Augmentation}.
Abstract:Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs). Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space. This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs, i.e., whether there exists a part of features arriving at the linear separable feature space in the intermediate layers, that does not require further non-linear variation but an affine transformation instead. To validate the above hypothesis, we explore the problem of linear feature disentanglement for neural networks in this paper. Specifically, we devise a learnable mask module to distinguish between linear and non-linear features. Through our designed experiments we found that some features reach the linearly separable space earlier than the others and can be detached partly from the NNs. The explored method also provides a readily feasible pruning strategy which barely affects the performance of the original model. We conduct our experiments on four datasets and present promising results.
Abstract:Series photo selection (SPS) is an important branch of the image aesthetics quality assessment, which focuses on finding the best one from a series of nearly identical photos. While a great progress has been observed, most of the existing SPS approaches concentrate solely on extracting features from the original image, neglecting that multiple views, e.g, saturation level, color histogram and depth of field of the image, will be of benefit to successfully reflecting the subtle aesthetic changes. Taken multi-view into consideration, we leverage a graph neural network to construct the relationships between multi-view features. Besides, multiple views are aggregated with an adaptive-weight self-attention module to verify the significance of each view. Finally, a siamese network is proposed to select the best one from a series of nearly identical photos. Experimental results demonstrate that our model accomplish the highest success rates compared with competitive methods.
Abstract:Predicting the future price trends of stocks is a challenging yet intriguing problem given its critical role to help investors make profitable decisions. In this paper, we present a collaborative temporal-relational modeling framework for end-to-end stock trend prediction. The temporal dynamics of stocks is firstly captured with an attention-based recurrent neural network. Then, different from existing studies relying on the pairwise correlations between stocks, we argue that stocks are naturally connected as a collective group, and introduce the hypergraph structures to jointly characterize the stock group-wise relationships of industry-belonging and fund-holding. A novel hypergraph tri-attention network (HGTAN) is proposed to augment the hypergraph convolutional networks with a hierarchical organization of intra-hyperedge, inter-hyperedge, and inter-hypergraph attention modules. In this manner, HGTAN adaptively determines the importance of nodes, hyperedges, and hypergraphs during the information propagation among stocks, so that the potential synergies between stock movements can be fully exploited. Extensive experiments on real-world data demonstrate the effectiveness of our approach. Also, the results of investment simulation show that our approach can achieve a more desirable risk-adjusted return. The data and codes of our work have been released at https://github.com/lixiaojieff/HGTAN.
Abstract:Natural language video localization (NLVL), which aims to locate a target moment from a video that semantically corresponds to a text query, is a novel and challenging task. Toward this end, in this paper, we present a comprehensive survey of the NLVL algorithms, where we first propose the pipeline of NLVL, and then categorize them into supervised and weakly-supervised methods, following by the analysis of the strengths and weaknesses of each kind of methods. Subsequently, we present the dataset, evaluation protocols and the general performance analysis. Finally, the possible perspectives are obtained by summarizing the existing methods.
Abstract:Breast cancer classification remains a challenging task due to inter-class ambiguity and intra-class variability. Existing deep learning-based methods try to confront this challenge by utilizing complex nonlinear projections. However, these methods typically extract global features from entire images, neglecting the fact that the subtle detail information can be crucial in extracting discriminative features. In this study, we propose a novel method named Attention Model Enhanced Network (AMEN), which is formulated in a multi-branch fashion with pixel-wised attention model and classification submodular. Specifically, the feature learning part in AMEN can generate pixel-wised attention map, while the classification submodular are utilized to classify the samples. To focus more on subtle detail information, the sample image is enhanced by the pixel-wised attention map generated from former branch. Furthermore, boosting strategy are adopted to fuse classification results from different branches for better performance. Experiments conducted on three benchmark datasets demonstrate the superiority of the proposed method under various scenarios.
Abstract:With the development of medical imaging technology and machine learning, computer-assisted diagnosis which can provide impressive reference to pathologists, attracts extensive research interests. The exponential growth of medical images and uninterpretability of traditional classification models have hindered the applications of computer-assisted diagnosis. To address these issues, we propose a novel method for Learning Binary Semantic Embedding (LBSE). Based on the efficient and effective embedding, classification and retrieval are performed to provide interpretable computer-assisted diagnosis for histology images. Furthermore, double supervision, bit uncorrelation and balance constraint, asymmetric strategy and discrete optimization are seamlessly integrated in the proposed method for learning binary embedding. Experiments conducted on three benchmark datasets validate the superiority of LBSE under various scenarios.
Abstract:Due to the compelling efficiency in retrieval and storage, similarity-preserving hashing has been widely applied to approximate nearest neighbor search in large-scale image retrieval. However, existing methods have poor performance in retrieval using an extremely short-length hash code due to weak ability of classification and poor distribution of hash bit. To address this issue, in this study, we propose a novel reinforcing short-length hashing (RSLH). In this proposed RSLH, mutual reconstruction between the hash representation and semantic labels is performed to preserve the semantic information. Furthermore, to enhance the accuracy of hash representation, a pairwise similarity matrix is designed to make a balance between accuracy and training expenditure on memory. In addition, a parameter boosting strategy is integrated to reinforce the precision with hash bits fusion. Extensive experiments on three large-scale image benchmarks demonstrate the superior performance of RSLH under various short-length hashing scenarios.
Abstract:Label information is widely used in hashing methods because of its effectiveness of improving the precision. The existing hashing methods always use two different projections to represent the mutual regression between hash codes and class labels. In contrast to the existing methods, we propose a novel learning-based hashing method termed stable supervised discrete hashing with mutual linear regression (S2DHMLR) in this study, where only one stable projection is used to describe the linear correlation between hash codes and corresponding labels. To the best of our knowledge, this strategy has not been used for hashing previously. In addition, we further use a boosting strategy to improve the final performance of the proposed method without adding extra constraints and with little extra expenditure in terms of time and space. Extensive experiments conducted on three image benchmarks demonstrate the superior performance of the proposed method.
Abstract:Content-based near-duplicate video detection (NDVD) is essential for effective search and retrieval, and robust video fingerprinting is a good solution for NDVD. Most existing video fingerprinting methods use a single feature or concatenating different features to generate video fingerprints, and show a good performance under single-mode modifications such as noise addition and blurring. However, when they suffer combined modifications, the performance is degraded to a certain extent because such features cannot characterize the video content completely. By contrast, the assistance and consensus among different features can improve the performance of video fingerprinting. Therefore, in the present study, we mine the assistance and consensus among different features based on tensor model, and present a new comprehensive feature to fully use them in the proposed video fingerprinting framework. We also analyze what the comprehensive feature really is for representing the original video. In this framework, the video is initially set as a high-order tensor that consists of different features, and the video tensor is decomposed via the Tucker model with a solution that determines the number of components. Subsequently, the comprehensive feature is generated by the low-order tensor obtained from tensor decomposition. Finally, the video fingerprint is computed using this feature. A matching strategy used for narrowing the search is also proposed based on the core tensor. The robust video fingerprinting framework is resistant not only to single-mode modifications, but also to the combination of them.