Abstract:The focus of this paper is on the problem of image retrieval with attribute manipulation. Our proposed work is able to manipulate the desired attributes of the query image while maintaining its other attributes. For example, the collar attribute of the query image can be changed from round to v-neck to retrieve similar images from a large dataset. A key challenge in e-commerce is that images have multiple attributes where users would like to manipulate and it is important to estimate discriminative feature representations for each of these attributes. The proposed FashionSearchNet-v2 architecture is able to learn attribute specific representations by leveraging on its weakly-supervised localization module, which ignores the unrelated features of attributes in the feature space, thus improving the similarity learning. The network is jointly trained with the combination of attribute classification and triplet ranking loss to estimate local representations. These local representations are then merged into a single global representation based on the instructed attribute manipulation where desired images can be retrieved with a distance metric. The proposed method also provides explainability for its retrieval process to help provide additional information on the attention of the network. Experiments performed on several datasets that are rich in terms of the number of attributes show that FashionSearchNet-v2 outperforms the other state-of-the-art attribute manipulation techniques. Different than our earlier work (FashionSearchNet), we propose several improvements in the learning procedure and show that the proposed FashionSearchNet-v2 can be generalized to different domains other than fashion.
Abstract:In this paper, we introduce a dictionary learning based approach applied to the problem of real-time reconstruction of MR image sequences that are highly undersampled in k-space. Unlike traditional dictionary learning, our method integrates both global and patch-wise (local) sparsity information and incorporates some priori information into the reconstruction process. Moreover, we use a Dependent Hierarchical Beta-process as the prior for the group-based dictionary learning, which adaptively infers the dictionary size and the sparsity of each patch; and also ensures that similar patches are manifested in terms of similar dictionary atoms. An efficient numerical algorithm based on the alternating direction method of multipliers (ADMM) is also presented. Through extensive experimental results we show that our proposed method achieves superior reconstruction quality, compared to the other state-of-the- art DL-based methods.
Abstract:The Convolutional Neural Network (CNN) has achieved great success in image classification. The classification model can also be utilized at image or patch level for many other applications, such as object detection and segmentation. In this paper, we propose a whole-image CNN regression model, by removing the full connection layer and training the network with continuous feature maps. This is a generic regression framework that fits many applications. We demonstrate this method through two tasks: simultaneous face detection & segmentation, and scene saliency prediction. The result is comparable with other models in the respective fields, using only a small scale network. Since the regression model is trained on corresponding image / feature map pairs, there are no requirements on uniform input size as opposed to the classification model. Our framework avoids classifier design, a process that may introduce too much manual intervention in model development. Yet, it is highly correlated to the classification network and offers some in-deep review of CNN structures.
Abstract:It has been recently shown that incorporating priori knowledge significantly improves the performance of basic compressive sensing based approaches. We have managed to successfully exploit this idea for recovering a matrix as a summation of a Low-rank and a Sparse component from compressive measurements. When applied to the problem of construction of 4D Cardiac MR image sequences in real-time from highly under-sampled $k-$space data, our proposed method achieves superior reconstruction quality compared to the other state-of-the-art methods.