Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liang Lin

PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report

Oct 03, 2018

Andrey Ignatov, Radu Timofte, Thang Van Vu, Tung Minh Luu, Trung X Pham, Cao Van Nguyen, Yongwoo Kim, Jae-Seok Choi, Munchurl Kim, Jie Huang(+38 more)

Figure 1 for PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report

Figure 2 for PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report

Figure 3 for PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report

Figure 4 for PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report

Abstract:This paper reviews the first challenge on efficient perceptual image enhancement with the focus on deploying deep learning models on smartphones. The challenge consisted of two tracks. In the first one, participants were solving the classical image super-resolution problem with a bicubic downscaling factor of 4. The second track was aimed at real-world photo enhancement, and the goal was to map low-quality photos from the iPhone 3GS device to the same photos captured with a DSLR camera. The target metric used in this challenge combined the runtime, PSNR scores and solutions' perceptual results measured in the user study. To ensure the efficiency of the submitted models, we additionally measured their runtime and memory requirements on Android smartphones. The proposed solutions significantly improved baseline results defining the state-of-the-art for image enhancement on smartphones.

Via

Access Paper or Ask Questions

Cost-effective Object Detection: Active Sample Mining with Switchable Selection Criteria

Sep 16, 2018

Keze Wang, Liang Lin, Xiaopeng Yan, Ziliang Chen, Dongyu Zhang, Lei Zhang

Figure 1 for Cost-effective Object Detection: Active Sample Mining with Switchable Selection Criteria

Figure 2 for Cost-effective Object Detection: Active Sample Mining with Switchable Selection Criteria

Figure 3 for Cost-effective Object Detection: Active Sample Mining with Switchable Selection Criteria

Figure 4 for Cost-effective Object Detection: Active Sample Mining with Switchable Selection Criteria

Abstract:Though quite challenging, the training of object detectors using large-scale unlabeled or partially labeled datasets has attracted increasing interests from researchers due to its fundamental importance for applications of neural networks and learning systems. To address this problem, many active learning (AL) methods have been proposed that employ up-to-date detectors to retrieve representative minority samples according to predefined confidence or uncertainty thresholds. However, these AL methods cause the detectors to ignore the remaining majority samples (i.e., those with low uncertainty or high prediction confidence). In this work, by developing a principled active sample mining (ASM) framework, we demonstrate that cost-effectively mining samples from these unlabeled majority data is key to training more powerful object detectors while minimizing user effort. Specifically, our ASM framework involves a selectively switchable sample selection mechanism for determining whether an unlabeled sample should be manually annotated via AL or automatically pseudo-labeled via a novel self-learning process. The proposed process can be compatible with mini-batch based training (i.e., using a batch of unlabeled or partially labeled data as a one-time input) for object detection. Extensive experiments on two public benchmarks clearly demonstrate that our ASM framework can achieve performance comparable to that of alternative methods but with significantly fewer annotations.

* Automatically determining whether an unlabeled sample should be manually annotated or pseudo-labeled via a novel self-learning process (Accepted by TNNLS 2018) The source code is available at http://kezewang.com/codes/ASM_ver1.zip

Via

Access Paper or Ask Questions

Toward Characteristic-Preserving Image-based Virtual Try-On Network

Sep 12, 2018

Bochao Wang, Huabin Zheng, Xiaodan Liang, Yimin Chen, Liang Lin, Meng Yang

Figure 1 for Toward Characteristic-Preserving Image-based Virtual Try-On Network

Figure 2 for Toward Characteristic-Preserving Image-based Virtual Try-On Network

Figure 3 for Toward Characteristic-Preserving Image-based Virtual Try-On Network

Figure 4 for Toward Characteristic-Preserving Image-based Virtual Try-On Network

Abstract:Image-based virtual try-on systems for fitting new in-shop clothes into a person image have attracted increasing research attention, yet is still challenging. A desirable pipeline should not only transform the target clothes into the most fitting shape seamlessly but also preserve well the clothes identity in the generated image, that is, the key characteristics (e.g. texture, logo, embroidery) that depict the original clothes. However, previous image-conditioned generation works fail to meet these critical requirements towards the plausible virtual try-on performance since they fail to handle large spatial misalignment between the input image and target clothes. Prior work explicitly tackled spatial deformation using shape context matching, but failed to preserve clothing details due to its coarse-to-fine strategy. In this work, we propose a new fully-learnable Characteristic-Preserving Virtual Try-On Network(CP-VTON) for addressing all real-world challenges in this task. First, CP-VTON learns a thin-plate spline transformation for transforming the in-shop clothes into fitting the body shape of the target person via a new Geometric Matching Module (GMM) rather than computing correspondences of interest points as prior works did. Second, to alleviate boundary artifacts of warped clothes and make the results more realistic, we employ a Try-On Module that learns a composition mask to integrate the warped clothes and the rendered image to ensure smoothness. Extensive experiments on a fashion dataset demonstrate our CP-VTON achieves the state-of-the-art virtual try-on performance both qualitatively and quantitatively.

* Accepted by ECCV 2018

Via

Access Paper or Ask Questions

Interpretable Visual Question Answering by Reasoning on Dependency Trees

Sep 06, 2018

Qingxing Cao, Xiaodan Liang, Bailin Li, Liang Lin

Figure 1 for Interpretable Visual Question Answering by Reasoning on Dependency Trees

Figure 2 for Interpretable Visual Question Answering by Reasoning on Dependency Trees

Figure 3 for Interpretable Visual Question Answering by Reasoning on Dependency Trees

Figure 4 for Interpretable Visual Question Answering by Reasoning on Dependency Trees

Abstract:Collaborative reasoning for understanding each image-question pair is very critical but underexplored for an interpretable visual question answering system. Although very recent works also attempted to use explicit compositional processes to assemble multiple subtasks embedded in the questions, their models heavily rely on annotations or handcrafted rules to obtain valid reasoning processes, leading to either heavy workloads or poor performance on composition reasoning. In this paper, to better align image and language domains in diverse and unrestricted cases, we propose a novel neural network model that performs global reasoning on a dependency tree parsed from the question, and we thus phrase our model as parse-tree-guided reasoning network (PTGRN). This network consists of three collaborative modules: i) an attention module to exploit the local visual evidence for each word parsed from the question, ii) a gated residual composition module to compose the previously mined evidence, and iii) a parse-tree-guided propagation module to pass the mined evidence along the parse tree. Our PTGRN is thus capable of building an interpretable VQA system that gradually derives the image cues following a question-driven parse-tree reasoning route. Experiments on relational datasets demonstrate the superiority of our PTGRN over current state-of-the-art VQA methods, and the visualization results highlight the explainable capability of our reasoning system.

* 14 pages, 10 figures. arXiv admin note: text overlap with arXiv:1804.00105

Via

Access Paper or Ask Questions

Unsupervised Image Super-Resolution using Cycle-in-Cycle Generative Adversarial Networks

Sep 03, 2018

Yuan Yuan, Siyuan Liu, Jiawei Zhang, Yongbing Zhang, Chao Dong, Liang Lin

Figure 1 for Unsupervised Image Super-Resolution using Cycle-in-Cycle Generative Adversarial Networks

Figure 2 for Unsupervised Image Super-Resolution using Cycle-in-Cycle Generative Adversarial Networks

Figure 3 for Unsupervised Image Super-Resolution using Cycle-in-Cycle Generative Adversarial Networks

Figure 4 for Unsupervised Image Super-Resolution using Cycle-in-Cycle Generative Adversarial Networks

Abstract:We consider the single image super-resolution problem in a more general case that the low-/high-resolution pairs and the down-sampling process are unavailable. Different from traditional super-resolution formulation, the low-resolution input is further degraded by noises and blurring. This complicated setting makes supervised learning and accurate kernel estimation impossible. To solve this problem, we resort to unsupervised learning without paired data, inspired by the recent successful image-to-image translation applications. With generative adversarial networks (GAN) as the basic component, we propose a Cycle-in-Cycle network structure to tackle the problem within three steps. First, the noisy and blurry input is mapped to a noise-free low-resolution space. Then the intermediate image is up-sampled with a pre-trained deep model. Finally, we fine-tune the two modules in an end-to-end manner to get the high-resolution output. Experiments on NTIRE2018 datasets demonstrate that the proposed unsupervised method achieves comparable results as the state-of-the-art supervised models.

* 10 pages (reference included), 6 figures

Via

Access Paper or Ask Questions

Attentive Crowd Flow Machines

Sep 01, 2018

Lingbo Liu, Ruimao Zhang, Jiefeng Peng, Guanbin Li, Bowen Du, Liang Lin

Figure 1 for Attentive Crowd Flow Machines

Figure 2 for Attentive Crowd Flow Machines

Figure 3 for Attentive Crowd Flow Machines

Figure 4 for Attentive Crowd Flow Machines

Abstract:Traffic flow prediction is crucial for urban traffic management and public safety. Its key challenges lie in how to adaptively integrate the various factors that affect the flow changes. In this paper, we propose a unified neural network module to address this problem, called Attentive Crowd Flow Machine~(ACFM), which is able to infer the evolution of the crowd flow by learning dynamic representations of temporally-varying data with an attention mechanism. Specifically, the ACFM is composed of two progressive ConvLSTM units connected with a convolutional layer for spatial weight prediction. The first LSTM takes the sequential flow density representation as input and generates a hidden state at each time-step for attention map inference, while the second LSTM aims at learning the effective spatial-temporal feature expression from attentionally weighted crowd flow features. Based on the ACFM, we further build a deep architecture with the application to citywide crowd flow prediction, which naturally incorporates the sequential and periodic data as well as other external influences. Extensive experiments on two standard benchmarks (i.e., crowd flow in Beijing and New York City) show that the proposed method achieves significant improvements over the state-of-the-art methods.

* ACM MM, full paper

Via

Access Paper or Ask Questions

Neural Task Planning with And-Or Graph Representations

Aug 25, 2018

Tianshui Chen, Riquan Chen, Lin Nie, Xiaonan Luo, Xiaobai Liu, Liang Lin

Figure 1 for Neural Task Planning with And-Or Graph Representations

Figure 2 for Neural Task Planning with And-Or Graph Representations

Figure 3 for Neural Task Planning with And-Or Graph Representations

Figure 4 for Neural Task Planning with And-Or Graph Representations

Abstract:This paper focuses on semantic task planning, i.e., predicting a sequence of actions toward accomplishing a specific task under a certain scene, which is a new problem in computer vision research. The primary challenges are how to model task-specific knowledge and how to integrate this knowledge into the learning procedure. In this work, we propose training a recurrent long short-term memory (LSTM) network to address this problem, i.e., taking a scene image (including pre-located objects) and the specified task as input and recurrently predicting action sequences. However, training such a network generally requires large numbers of annotated samples to cover the semantic space (e.g., diverse action decomposition and ordering). To overcome this issue, we introduce a knowledge and-or graph (AOG) for task description, which hierarchically represents a task as atomic actions. With this AOG representation, we can produce many valid samples (i.e., action sequences according to common sense) by training another auxiliary LSTM network with a small set of annotated samples. Furthermore, these generated samples (i.e., task-oriented action sequences) effectively facilitate training of the model for semantic task planning. In our experiments, we create a new dataset that contains diverse daily tasks and extensively evaluate the effectiveness of our approach.

* Submitted to TMM, under minor revision. arXiv admin note: text overlap with arXiv:1707.04677

Via

Access Paper or Ask Questions

Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding

Aug 14, 2018

Tianshui Chen, Wenxi Wu, Yuefang Gao, Le Dong, Xiaonan Luo, Liang Lin

Figure 1 for Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding

Figure 2 for Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding

Figure 3 for Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding

Figure 4 for Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding

Abstract:Object categories inherently form a hierarchy with different levels of concept abstraction, especially for fine-grained categories. For example, birds (Aves) can be categorized according to a four-level hierarchy of order, family, genus, and species. This hierarchy encodes rich correlations among various categories across different levels, which can effectively regularize the semantic space and thus make prediction less ambiguous. However, previous studies of fine-grained image recognition primarily focus on categories of one certain level and usually overlook this correlation information. In this work, we investigate simultaneously predicting categories of different levels in the hierarchy and integrating this structured correlation information into the deep neural network by developing a novel Hierarchical Semantic Embedding (HSE) framework. Specifically, the HSE framework sequentially predicts the category score vector of each level in the hierarchy, from highest to lowest. At each level, it incorporates the predicted score vector of the higher level as prior knowledge to learn finer-grained feature representation. During training, the predicted score vector of the higher level is also employed to regularize label prediction by using it as soft targets of corresponding sub-categories. To evaluate the proposed framework, we organize the 200 bird species of the Caltech-UCSD birds dataset with the four-level category hierarchy and construct a large-scale butterfly dataset that also covers four level categories. Extensive experiments on these two and the newly-released VegFru datasets demonstrate the superiority of our HSE framework over the baseline methods and existing competitors.

* Accepted at ACM MM 2018 as oral presentation

Via

Access Paper or Ask Questions

Adaptive Temporal Encoding Network for Video Instance-level Human Parsing

Aug 10, 2018

Qixian Zhou, Xiaodan Liang, Ke Gong, Liang Lin

Figure 1 for Adaptive Temporal Encoding Network for Video Instance-level Human Parsing

Figure 2 for Adaptive Temporal Encoding Network for Video Instance-level Human Parsing

Figure 3 for Adaptive Temporal Encoding Network for Video Instance-level Human Parsing

Figure 4 for Adaptive Temporal Encoding Network for Video Instance-level Human Parsing

Abstract:Beyond the existing single-person and multiple-person human parsing tasks in static images, this paper makes the first attempt to investigate a more realistic video instance-level human parsing that simultaneously segments out each person instance and parses each instance into more fine-grained parts (e.g., head, leg, dress). We introduce a novel Adaptive Temporal Encoding Network (ATEN) that alternatively performs temporal encoding among key frames and flow-guided feature propagation from other consecutive frames between two key frames. Specifically, ATEN first incorporates a Parsing-RCNN to produce the instance-level parsing result for each key frame, which integrates both the global human parsing and instance-level human segmentation into a unified model. To balance between accuracy and efficiency, the flow-guided feature propagation is used to directly parse consecutive frames according to their identified temporal consistency with key frames. On the other hand, ATEN leverages the convolution gated recurrent units (convGRU) to exploit temporal changes over a series of key frames, which are further used to facilitate the frame-level instance-level parsing. By alternatively performing direct feature propagation between consistent frames and temporal encoding network among key frames, our ATEN achieves a good balance between frame-level accuracy and time efficiency, which is a common crucial problem in video object segmentation research. To demonstrate the superiority of our ATEN, extensive experiments are conducted on the most popular video segmentation benchmark (DAVIS) and a newly collected Video Instance-level Parsing (VIP) dataset, which is the first video instance-level human parsing dataset comprised of 404 sequences and over 20k frames with instance-level and pixel-wise annotations.

* To appear in ACM MM 2018. Code link: https://github.com/HCPLab-SYSU/ATEN. Dataset link: http://sysu-hcp.net/lip

Via

Access Paper or Ask Questions

Non-locally Enhanced Encoder-Decoder Network for Single Image De-raining

Aug 04, 2018

Guanbin Li, Xiang He, Wei Zhang, Huiyou Chang, Le Dong, Liang Lin

Figure 1 for Non-locally Enhanced Encoder-Decoder Network for Single Image De-raining

Figure 2 for Non-locally Enhanced Encoder-Decoder Network for Single Image De-raining

Figure 3 for Non-locally Enhanced Encoder-Decoder Network for Single Image De-raining

Figure 4 for Non-locally Enhanced Encoder-Decoder Network for Single Image De-raining

Abstract:Single image rain streaks removal has recently witnessed substantial progress due to the development of deep convolutional neural networks. However, existing deep learning based methods either focus on the entrance and exit of the network by decomposing the input image into high and low frequency information and employing residual learning to reduce the mapping range, or focus on the introduction of cascaded learning scheme to decompose the task of rain streaks removal into multi-stages. These methods treat the convolutional neural network as an encapsulated end-to-end mapping module without deepening into the rationality and superiority of neural network design. In this paper, we delve into an effective end-to-end neural network structure for stronger feature expression and spatial correlation learning. Specifically, we propose a non-locally enhanced encoder-decoder network framework, which consists of a pooling indices embedded encoder-decoder network to efficiently learn increasingly abstract feature representation for more accurate rain streaks modeling while perfectly preserving the image detail. The proposed encoder-decoder framework is composed of a series of non-locally enhanced dense blocks that are designed to not only fully exploit hierarchical features from all the convolutional layers but also well capture the long-distance dependencies and structural information. Extensive experiments on synthetic and real datasets demonstrate that the proposed method can effectively remove rain-streaks on rainy image of various densities while well preserving the image details, which achieves significant improvements over the recent state-of-the-art methods.

* Accepted to ACM Multimedia 2018

Via

Access Paper or Ask Questions