Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiawan Zhang

Face Inverse Rendering via Hierarchical Decoupling

Jan 17, 2023

Meng Wang, Xiaojie Guo, Wenjing Dai, Jiawan Zhang

Figure 1 for Face Inverse Rendering via Hierarchical Decoupling

Figure 2 for Face Inverse Rendering via Hierarchical Decoupling

Figure 3 for Face Inverse Rendering via Hierarchical Decoupling

Figure 4 for Face Inverse Rendering via Hierarchical Decoupling

Abstract:Previous face inverse rendering methods often require synthetic data with ground truth and/or professional equipment like a lighting stage. However, a model trained on synthetic data or using pre-defined lighting priors is typically unable to generalize well for real-world situations, due to the gap between synthetic data/lighting priors and real data. Furthermore, for common users, the professional equipment and skill make the task expensive and complex. In this paper, we propose a deep learning framework to disentangle face images in the wild into their corresponding albedo, normal, and lighting components. Specifically, a decomposition network is built with a hierarchical subdivision strategy, which takes image pairs captured from arbitrary viewpoints as input. In this way, our approach can greatly mitigate the pressure from data preparation, and significantly broaden the applicability of face inverse rendering. Extensive experiments are conducted to demonstrate the efficacy of our design, and show its superior performance in face relighting over other state-of-the-art alternatives. {Our code is available at \url{https://github.com/AutoHDR/HD-Net.git}}

* IEEE Transactions on Image Processing, Volume: 31; Year: 2022; Page: 5748 - 5761

Via

Access Paper or Ask Questions

Deep Uncalibrated Photometric Stereo via Inter-Intra Image Feature Fusion

Aug 06, 2022

Fangzhou Gao, Meng Wang, Lianghao Zhang, Li Wang, Jiawan Zhang

Figure 1 for Deep Uncalibrated Photometric Stereo via Inter-Intra Image Feature Fusion

Figure 2 for Deep Uncalibrated Photometric Stereo via Inter-Intra Image Feature Fusion

Figure 3 for Deep Uncalibrated Photometric Stereo via Inter-Intra Image Feature Fusion

Figure 4 for Deep Uncalibrated Photometric Stereo via Inter-Intra Image Feature Fusion

Abstract:Uncalibrated photometric stereo is proposed to estimate the detailed surface normal from images under varying and unknown lightings. Recently, deep learning brings powerful data priors to this underdetermined problem. This paper presents a new method for deep uncalibrated photometric stereo, which efficiently utilizes the inter-image representation to guide the normal estimation. Previous methods use optimization-based neural inverse rendering or a single size-independent pooling layer to deal with multiple inputs, which are inefficient for utilizing information among input images. Given multi-images under different lighting, we consider the intra-image and inter-image variations highly correlated. Motivated by the correlated variations, we designed an inter-intra image feature fusion module to introduce the inter-image representation into the per-image feature extraction. The extra representation is used to guide the per-image feature extraction and eliminate the ambiguity in normal estimation. We demonstrate the effect of our design on a wide range of samples, especially on dark materials. Our method produces significantly better results than the state-of-the-art methods on both synthetic and real data.

* 7 pages

Via

Access Paper or Ask Questions

Towards Visual Explainable Active Learning for Zero-Shot Classification

Aug 15, 2021

Shichao Jia, Zeyu Li, Nuo Chen, Jiawan Zhang

Figure 1 for Towards Visual Explainable Active Learning for Zero-Shot Classification

Figure 2 for Towards Visual Explainable Active Learning for Zero-Shot Classification

Figure 3 for Towards Visual Explainable Active Learning for Zero-Shot Classification

Figure 4 for Towards Visual Explainable Active Learning for Zero-Shot Classification

Abstract:Zero-shot classification is a promising paradigm to solve an applicable problem when the training classes and test classes are disjoint. Achieving this usually needs experts to externalize their domain knowledge by manually specifying a class-attribute matrix to define which classes have which attributes. Designing a suitable class-attribute matrix is the key to the subsequent procedure, but this design process is tedious and trial-and-error with no guidance. This paper proposes a visual explainable active learning approach with its design and implementation called semantic navigator to solve the above problems. This approach promotes human-AI teaming with four actions (ask, explain, recommend, respond) in each interaction loop. The machine asks contrastive questions to guide humans in the thinking process of attributes. A novel visualization called semantic map explains the current status of the machine. Therefore analysts can better understand why the machine misclassifies objects. Moreover, the machine recommends the labels of classes for each attribute to ease the labeling burden. Finally, humans can steer the model by modifying the labels interactively, and the machine adjusts its recommendations. The visual explainable active learning approach improves humans' efficiency of building zero-shot classification models interactively, compared with the method without guidance. We justify our results with user studies using the standard benchmarks for zero-shot classification.

Via

Access Paper or Ask Questions

Dunhuang Grottoes Painting Dataset and Benchmark

Jul 11, 2019

Tianxiu Yu, Shijie Zhang, Cong Lin, Shaodi You, Jian Wu, Jiawan Zhang, Xiaohong Ding, Huili An

Figure 1 for Dunhuang Grottoes Painting Dataset and Benchmark

Figure 2 for Dunhuang Grottoes Painting Dataset and Benchmark

Figure 3 for Dunhuang Grottoes Painting Dataset and Benchmark

Figure 4 for Dunhuang Grottoes Painting Dataset and Benchmark

Abstract:This document introduces the background and the usage of the Dunhuang Grottoes Dataset and the benchmark. The documentation first starts with the background of the Dunhuang Grotto, which is widely recognised as an priceless heritage. Given that digital method is the modern trend for heritage protection and restoration. Follow the trend, we release the first public dataset for Dunhuang Grotto Painting restoration. The rest of the documentation details the painting data generation. To enable a data driven fashion, this dataset provided a large number of training and testing example which is sufficient for a deep learning approach. The detailed usage of the dataset as well as the benchmark is described.

* 8 pages, 1 column

Via

Access Paper or Ask Questions

OVSNet : Towards One-Pass Real-Time Video Object Segmentation

May 24, 2019

Peng Sun, Peiwen Lin, Guangliang Cheng, Jianping Shi, Jiawan Zhang, Xi Li

Figure 1 for OVSNet : Towards One-Pass Real-Time Video Object Segmentation

Figure 2 for OVSNet : Towards One-Pass Real-Time Video Object Segmentation

Figure 3 for OVSNet : Towards One-Pass Real-Time Video Object Segmentation

Figure 4 for OVSNet : Towards One-Pass Real-Time Video Object Segmentation

Abstract:Video object segmentation aims at accurately segmenting the target object regions across consecutive frames. It is technically challenging for coping with complicated factors (e.g., shape deformations, occlusion and out of the lens). Recent approaches have largely solved them by using backforth re-identification and bi-directional mask propagation. However, their methods are extremely slow and only support offline inference, which in principle cannot be applied in real time. Motivated by this observation, we propose a new detection-based paradigm for video object segmentation. We propose an unified One-Pass Video Segmentation framework (OVS-Net) for modeling spatial-temporal representation in an end-to-end pipeline, which seamlessly integrates object detection, object segmentation, and object re-identification. The proposed framework lends itself to one-pass inference that effectively and efficiently performs video object segmentation. Moreover, we propose a mask guided attention module for modeling the multi-scale object boundary and multi-level feature fusion. Experiments on the challenging DAVIS 2017 demonstrate the effectiveness of the proposed framework with comparable performance to the state-of-the-art, and the great efficiency about 11.5 fps towards pioneering real-time work to our knowledge, more than 5 times faster than other state-of-the-art methods.

* 10 pages, 6 figures

Via

Access Paper or Ask Questions

Kindling the Darkness: A Practical Low-light Image Enhancer

May 04, 2019

Yonghua Zhang, Jiawan Zhang, Xiaojie Guo

Figure 1 for Kindling the Darkness: A Practical Low-light Image Enhancer

Figure 2 for Kindling the Darkness: A Practical Low-light Image Enhancer

Figure 3 for Kindling the Darkness: A Practical Low-light Image Enhancer

Figure 4 for Kindling the Darkness: A Practical Low-light Image Enhancer

Abstract:Images captured under low-light conditions often suffer from (partially) poor visibility. Besides unsatisfactory lightings, multiple types of degradations, such as noise and color distortion due to the limited quality of cameras, hide in the dark. In other words, solely turning up the brightness of dark regions will inevitably amplify hidden artifacts. This work builds a simple yet effective network for \textbf{Kin}dling the \textbf{D}arkness (denoted as KinD), which, inspired by Retinex theory, decomposes images into two components. One component (illumination) is responsible for light adjustment, while the other (reflectance) for degradation removal. In such a way, the original space is decoupled into two smaller subspaces, expecting to be better regularized/learned. It is worth to note that our network is trained with paired images shot under different exposure conditions, instead of using any ground-truth reflectance and illumination information. Extensive experiments are conducted to demonstrate the efficacy of our design and its superiority over state-of-the-art alternatives. Our KinD is robust against severe visual defects, and user-friendly to arbitrarily adjust light levels. In addition, our model spends less than 50ms to process an image in VGA resolution on a 2080Ti GPU. All the above merits make our KinD attractive for practical use.

Via

Access Paper or Ask Questions

Single Image Deraining: A Comprehensive Benchmark Analysis

Mar 20, 2019

Siyuan Li, Iago Breno Araujo, Wenqi Ren, Zhangyang Wang, Eric K. Tokuda, Roberto Hirata Junior, Roberto Cesar-Junior, Jiawan Zhang, Xiaojie Guo, Xiaochun Cao

Figure 1 for Single Image Deraining: A Comprehensive Benchmark Analysis

Figure 2 for Single Image Deraining: A Comprehensive Benchmark Analysis

Figure 3 for Single Image Deraining: A Comprehensive Benchmark Analysis

Figure 4 for Single Image Deraining: A Comprehensive Benchmark Analysis

Abstract:We present a comprehensive study and evaluation of existing single image deraining algorithms, using a new large-scale benchmark consisting of both synthetic and real-world rainy images.This dataset highlights diverse data sources and image contents, and is divided into three subsets (rain streak, rain drop, rain and mist), each serving different training or evaluation purposes. We further provide a rich variety of criteria for dehazing algorithm evaluation, ranging from full-reference metrics, to no-reference metrics, to subjective evaluation and the novel task-driven evaluation. Experiments on the dataset shed light on the comparisons and limitations of state-of-the-art deraining algorithms, and suggest promising future directions.

Via

Access Paper or Ask Questions

PFLD: A Practical Facial Landmark Detector

Mar 03, 2019

Xiaojie Guo, Siyuan Li, Jinke Yu, Jiawan Zhang, Jiayi Ma, Lin Ma, Wei Liu, Haibin Ling

Figure 1 for PFLD: A Practical Facial Landmark Detector

Figure 2 for PFLD: A Practical Facial Landmark Detector

Figure 3 for PFLD: A Practical Facial Landmark Detector

Figure 4 for PFLD: A Practical Facial Landmark Detector

Abstract:Being accurate, efficient, and compact is essential to a facial landmark detector for practical use. To simultaneously consider the three concerns, this paper investigates a neat model with promising detection accuracy under wild environments e.g., unconstrained pose, expression, lighting, and occlusion conditions) and super real-time speed on a mobile device. More concretely, we customize an end-to-end single stage network associated with acceleration techniques. During the training phase, for each sample, rotation information is estimated for geometrically regularizing landmark localization, which is then NOT involved in the testing phase. A novel loss is designed to, besides considering the geometrical regularization, mitigate the issue of data imbalance by adjusting weights of samples to different states, such as large pose, extreme lighting, and occlusion, in the training set. Extensive experiments are conducted to demonstrate the efficacy of our design and reveal its superior performance over state-of-the-art alternatives on widely-adopted challenging benchmarks, i.e., 300W (including iBUG, LFPW, AFW, HELEN, and XM2VTS) and AFLW. Our model can be merely 2.1Mb of size and reach over 140 fps per face on a mobile phone (Qualcomm ARM 845 processor) with high precision, making it attractive for large-scale or real-time applications. We have made our practical system based on PFLD 0.25X model publicly available at \url{http://sites.google.com/view/xjguo/fld} for encouraging comparisons and improvements from the community.

Via

Access Paper or Ask Questions

Fast Single Image Rain Removal via a Deep Decomposition-Composition Network

Apr 08, 2018

Siyuan LI, Wenqi Ren, Jiawan Zhang, Jinke Yu, Xiaojie Guo

Figure 1 for Fast Single Image Rain Removal via a Deep Decomposition-Composition Network

Figure 2 for Fast Single Image Rain Removal via a Deep Decomposition-Composition Network

Figure 3 for Fast Single Image Rain Removal via a Deep Decomposition-Composition Network

Figure 4 for Fast Single Image Rain Removal via a Deep Decomposition-Composition Network

Abstract:Rain effect in images typically is annoying for many multimedia and computer vision tasks. For removing rain effect from a single image, deep leaning techniques have been attracting considerable attentions. This paper designs a novel multi-task leaning architecture in an end-to-end manner to reduce the mapping range from input to output and boost the performance. Concretely, a decomposition net is built to split rain images into clean background and rain layers. Different from previous architectures, our model consists of, besides a component representing the desired clean image, an extra component for the rain layer. During the training phase, we further employ a composition structure to reproduce the input by the separated clean image and rain information for improving the quality of decomposition. Experimental results on both synthetic and real images are conducted to reveal the high-quality recovery by our design, and show its superiority over other state-of-the-art methods. Furthermore, our design is also applicable to other layer decomposition tasks like dust removal. More importantly, our method only requires about 50ms, significantly faster than the competitors, to process a testing image in VGA resolution on a GTX 1080 GPU, making it attractive for practical use.

Via

Access Paper or Ask Questions

Automatic Generation of Grounded Visual Questions

May 29, 2017

Shijie Zhang, Lizhen Qu, Shaodi You, Zhenglu Yang, Jiawan Zhang

Figure 1 for Automatic Generation of Grounded Visual Questions

Figure 2 for Automatic Generation of Grounded Visual Questions

Figure 3 for Automatic Generation of Grounded Visual Questions

Figure 4 for Automatic Generation of Grounded Visual Questions

Abstract:In this paper, we propose the first model to be able to generate visually grounded questions with diverse types for a single image. Visual question generation is an emerging topic which aims to ask questions in natural language based on visual input. To the best of our knowledge, it lacks automatic methods to generate meaningful questions with various types for the same visual input. To circumvent the problem, we propose a model that automatically generates visually grounded questions with varying types. Our model takes as input both images and the captions generated by a dense caption model, samples the most probable question types, and generates the questions in sequel. The experimental results on two real world datasets show that our model outperforms the strongest baseline in terms of both correctness and diversity with a wide margin.

* IJCAI 2017
* VQA

Via

Access Paper or Ask Questions