Abstract:Panoramic Narrative Grounding (PNG) is an emerging visual grounding task that aims to segment visual objects in images based on dense narrative captions. The current state-of-the-art methods first refine the representation of phrase by aggregating the most similar $k$ image pixels, and then match the refined text representations with the pixels of the image feature map to generate segmentation results. However, simply aggregating sampled image features ignores the contextual information, which can lead to phrase-to-pixel mis-match. In this paper, we propose a novel learning framework called Deformable Attention Refined Matching Network (DRMN), whose main idea is to bring deformable attention in the iterative process of feature learning to incorporate essential context information of different scales of pixels. DRMN iteratively re-encodes pixels with the deformable attention network after updating the feature representation of the top-$k$ most similar pixels. As such, DRMN can lead to accurate yet discriminative pixel representations, purify the top-$k$ most similar pixels, and consequently alleviate the phrase-to-pixel mis-match substantially.Experimental results show that our novel design significantly improves the matching results between text phrases and image pixels. Concretely, DRMN achieves new state-of-the-art performance on the PNG benchmark with an average recall improvement 3.5%. The codes are available in: https://github.com/JaMesLiMers/DRMN.
Abstract:In this paper, an adversarial erasing embedding network with the guidance of high-order attributes (AEEN-HOA) is proposed for going further to solve the challenging ZSL/GZSL task. AEEN-HOA consists of two branches, i.e., the upper stream is capable of erasing some initially discovered regions, then the high-order attribute supervision is incorporated to characterize the relationship between the class attributes. Meanwhile, the bottom stream is trained by taking the current background regions to train the same attribute. As far as we know, it is the first time of introducing the erasing operations into the ZSL task. In addition, we first propose a class attribute activation map for the visualization of ZSL output, which shows the relationship between class attribute feature and attention map. Experiments on four standard benchmark datasets demonstrate the superiority of AEEN-HOA framework.
Abstract:Conjugate gradient (CG) methods are a class of important methods for solving linear equations and nonlinear optimization problems. In this paper, we propose a new stochastic CG algorithm with variance reduction and we prove its linear convergence with the Fletcher and Reeves method for strongly convex and smooth functions. We experimentally demonstrate that the CG with variance reduction algorithm converges faster than its counterparts for four learning models, which may be convex, nonconvex or nonsmooth. In addition, its area under the curve performance on six large-scale data sets is comparable to that of the LIBLINEAR solver for the L2-regularized L2-loss but with a significant improvement in computational efficiency
Abstract:The study focused on the machine learning analysis approaches to identify the adulteration of 9 kinds of edible oil qualitatively and answered the following three questions: Is the oil sample adulterant? How does it constitute? What is the main ingredient of the adulteration oil? After extracting the high-performance liquid chromatography (HPLC) data on triglyceride from 370 oil samples, we applied the adaptive boosting with multi-class Hamming loss (AdaBoost.MH) to distinguish the oil adulteration in contrast with the support vector machine (SVM). Further, we regarded the adulterant oil and the pure oil samples as ones with multiple labels and with only one label, respectively. Then multi-label AdaBoost.MH and multi-label learning vector quantization (ML-LVQ) model were built to determine the ingredients and their relative ratio in the adulteration oil. The experimental results on six measures show that ML-LVQ achieves better performance than multi-label AdaBoost.MH.
Abstract:Linear NDCG is used for measuring the performance of the Web content quality assessment in ECML/PKDD Discovery Challenge 2010. In this paper, we will prove that the DCG error equals a new pair-wise loss.