Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tiesong Zhao

Structural Similarity in Deep Features: Image Quality Assessment Robust to Geometrically Disparate Reference

Dec 27, 2024

Keke Zhang, Weiling Chen, Tiesong Zhao, Zhou Wang

Abstract:Image Quality Assessment (IQA) with references plays an important role in optimizing and evaluating computer vision tasks. Traditional methods assume that all pixels of the reference and test images are fully aligned. Such Aligned-Reference IQA (AR-IQA) approaches fail to address many real-world problems with various geometric deformations between the two images. Although significant effort has been made to attack Geometrically-Disparate-Reference IQA (GDR-IQA) problem, it has been addressed in a task-dependent fashion, for example, by dedicated designs for image super-resolution and retargeting, or by assuming the geometric distortions to be small that can be countered by translation-robust filters or by explicit image registrations. Here we rethink this problem and propose a unified, non-training-based Deep Structural Similarity (DeepSSIM) approach to address the above problems in a single framework, which assesses structural similarity of deep features in a simple but efficient way and uses an attention calibration strategy to alleviate attention deviation. The proposed method, without application-specific design, achieves state-of-the-art performance on AR-IQA datasets and meanwhile shows strong robustness to various GDR-IQA test cases. Interestingly, our test also shows the effectiveness of DeepSSIM as an optimization tool for training image super-resolution, enhancement and restoration, implying an even wider generalizability. \footnote{Source code will be made public after the review is completed.

Via

Access Paper or Ask Questions

SIAVC: Semi-Supervised Framework for Industrial Accident Video Classification

May 23, 2024

Zuoyong Li, Qinghua Lin, Haoyi Fan, Tiesong Zhao, David Zhang

Abstract:Semi-supervised learning suffers from the imbalance of labeled and unlabeled training data in the video surveillance scenario. In this paper, we propose a new semi-supervised learning method called SIAVC for industrial accident video classification. Specifically, we design a video augmentation module called the Super Augmentation Block (SAB). SAB adds Gaussian noise and randomly masks video frames according to historical loss on the unlabeled data for model optimization. Then, we propose a Video Cross-set Augmentation Module (VCAM) to generate diverse pseudo-label samples from the high-confidence unlabeled samples, which alleviates the mismatch of sampling experience and provides high-quality training data. Additionally, we construct a new industrial accident surveillance video dataset with frame-level annotation, namely ECA9, to evaluate our proposed method. Compared with the state-of-the-art semi-supervised learning based methods, SIAVC demonstrates outstanding video classification performance, achieving 88.76\% and 89.13\% accuracy on ECA9 and Fire Detection datasets, respectively. The source code and the constructed dataset ECA9 will be released in \url{https://github.com/AlchemyEmperor/SIAVC}.

Via

Access Paper or Ask Questions

Beyond Night Visibility: Adaptive Multi-Scale Fusion of Infrared and Visible Images

Mar 02, 2024

Shufan Pei, Junhong Lin, Wenxi Liu, Tiesong Zhao, Chia-Wen Lin

Abstract:In addition to low light, night images suffer degradation from light effects (e.g., glare, floodlight, etc). However, existing nighttime visibility enhancement methods generally focus on low-light regions, which neglects, or even amplifies the light effects. To address this issue, we propose an Adaptive Multi-scale Fusion network (AMFusion) with infrared and visible images, which designs fusion rules according to different illumination regions. First, we separately fuse spatial and semantic features from infrared and visible images, where the former are used for the adjustment of light distribution and the latter are used for the improvement of detection accuracy. Thereby, we obtain an image free of low light and light effects, which improves the performance of nighttime object detection. Second, we utilize detection features extracted by a pre-trained backbone that guide the fusion of semantic features. Hereby, we design a Detection-guided Semantic Fusion Module (DSFM) to bridge the domain gap between detection and semantic features. Third, we propose a new illumination loss to constrain fusion image with normal light intensity. Experimental results demonstrate the superiority of AMFusion with better visual quality and detection accuracy. The source code will be released after the peer review process.

Via

Access Paper or Ask Questions

Non-Intrusive Electric Load Monitoring Approach Based on Current Feature Visualization for Smart Energy Management

Aug 08, 2023

Yiwen Xu, Dengfeng Liu, Liangtao Huang, Zhiquan Lin, Tiesong Zhao, Sam Kwong

Abstract:The state-of-the-art smart city has been calling for an economic but efficient energy management over large-scale network, especially for the electric power system. It is a critical issue to monitor, analyze and control electric loads of all users in system. In this paper, we employ the popular computer vision techniques of AI to design a non-invasive load monitoring method for smart electric energy management. First of all, we utilize both signal transforms (including wavelet transform and discrete Fourier transform) and Gramian Angular Field (GAF) methods to map one-dimensional current signals onto two-dimensional color feature images. Second, we propose to recognize all electric loads from color feature images using a U-shape deep neural network with multi-scale feature extraction and attention mechanism. Third, we design our method as a cloud-based, non-invasive monitoring of all users, thereby saving energy cost during electric power system control. Experimental results on both public and our private datasets have demonstrated our method achieves superior performances than its peers, and thus supports efficient energy management over large-scale Internet of Things (IoT).

Via

Access Paper or Ask Questions

Geometry-based spherical JND modeling for 360$^\circ$ display

Mar 07, 2023

Hongan Wei, Jiaqi Liu, Bo Chen, Liqun Lin, Weiling Chen, Tiesong Zhao

Abstract:360$^\circ$ videos have received widespread attention due to its realistic and immersive experiences for users. To date, how to accurately model the user perceptions on 360$^\circ$ display is still a challenging issue. In this paper, we exploit the visual characteristics of 360$^\circ$ projection and display and extend the popular just noticeable difference (JND) model to spherical JND (SJND). First, we propose a quantitative 2D-JND model by jointly considering spatial contrast sensitivity, luminance adaptation and texture masking effect. In particular, our model introduces an entropy-based region classification and utilizes different parameters for different types of regions for better modeling performance. Second, we extend our 2D-JND model to SJND by jointly exploiting latitude projection and field of view during 360$^\circ$ display. With this operation, SJND reflects both the characteristics of human vision system and the 360$^\circ$ display. Third, our SJND model is more consistent with user perceptions during subjective test and also shows more tolerance in distortions with fewer bit rates during 360$^\circ$ video compression. To further examine the effectiveness of our SJND model, we embed it in Versatile Video Coding (VVC) compression. Compared with the state-of-the-arts, our SJND-VVC framework significantly reduced the bit rate with negligible loss in visual quality.

Via

Access Paper or Ask Questions

Saliency-Aware Spatio-Temporal Artifact Detection for Compressed Video Quality Assessment

Jan 03, 2023

Liqun Lin, Yang Zheng, Weiling Chen, Chengdong Lan, Tiesong Zhao

Abstract:Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.

Via

Access Paper or Ask Questions

LMQFormer: A Laplace-Prior-Guided Mask Query Transformer for Lightweight Snow Removal

Oct 12, 2022

Junhong Lin, Nanfeng Jiang, Zhentao Zhang, Weiling Chen, Tiesong Zhao

Figure 1 for LMQFormer: A Laplace-Prior-Guided Mask Query Transformer for Lightweight Snow Removal

Figure 2 for LMQFormer: A Laplace-Prior-Guided Mask Query Transformer for Lightweight Snow Removal

Figure 3 for LMQFormer: A Laplace-Prior-Guided Mask Query Transformer for Lightweight Snow Removal

Figure 4 for LMQFormer: A Laplace-Prior-Guided Mask Query Transformer for Lightweight Snow Removal

Abstract:Snow removal aims to locate snow areas and recover clean images without repairing traces. Unlike the regularity and semitransparency of rain, snow with various patterns and degradations seriously occludes the background. As a result, the state-of-the-art snow removal methods usually retains a large parameter size. In this paper, we propose a lightweight but high-efficient snow removal network called Laplace Mask Query Transformer (LMQFormer). Firstly, we present a Laplace-VQVAE to generate a coarse mask as prior knowledge of snow. Instead of using the mask in dataset, we aim at reducing both the information entropy of snow and the computational cost of recovery. Secondly, we design a Mask Query Transformer (MQFormer) to remove snow with the coarse mask, where we use two parallel encoders and a hybrid decoder to learn extensive snow features under lightweight requirements. Thirdly, we develop a Duplicated Mask Query Attention (DMQA) that converts the coarse mask into a specific number of queries, which constraint the attention areas of MQFormer with reduced parameters. Experimental results in popular datasets have demonstrated the efficiency of our proposed model, which achieves the state-of-the-art snow removal quality with significantly reduced parameters and the lowest running time.

* 9 pages, 11 figures

Via

Access Paper or Ask Questions

UIF: An Objective Quality Assessment for Underwater Image Enhancement

May 19, 2022

Yannan Zheng, Weiling Chen, Rongfu Lin, Tiesong Zhao

Figure 1 for UIF: An Objective Quality Assessment for Underwater Image Enhancement

Figure 2 for UIF: An Objective Quality Assessment for Underwater Image Enhancement

Figure 3 for UIF: An Objective Quality Assessment for Underwater Image Enhancement

Figure 4 for UIF: An Objective Quality Assessment for Underwater Image Enhancement

Abstract:Due to complex and volatile lighting environment, underwater imaging can be readily impaired by light scattering, warping, and noises. To improve the visual quality, Underwater Image Enhancement (UIE) techniques have been widely studied. Recent efforts have also been contributed to evaluate and compare the UIE performances with subjective and objective methods. However, the subjective evaluation is time-consuming and uneconomic for all images, while existing objective methods have limited capabilities for the newly-developed UIE approaches based on deep learning. To fill this gap, we propose an Underwater Image Fidelity (UIF) metric for objective evaluation of enhanced underwater images. By exploiting the statistical features of these images, we present to extract naturalness-related, sharpness-related, and structure-related features. Among them, the naturalness-related and sharpness-related features evaluate visual improvement of enhanced images; the structure-related feature indicates structural similarity between images before and after UIE. Then, we employ support vector regression to fuse the above three features into a final UIF metric. In addition, we have also established a large-scale UIE database with subjective scores, namely Underwater Image Enhancement Database (UIED), which is utilized as a benchmark to compare all objective metrics. Experimental results confirm that the proposed UIF outperforms a variety of underwater and general-purpose image quality metrics.

* This paper was submitted to ACMMM 2021

Via

Access Paper or Ask Questions

Deep Quality Assessment of Compressed Videos: A Subjective and Objective Study

May 07, 2022

Liqun Lin, Zheng Wang, Jiachen He, Weiling Chen, Yiwen Xu, Tiesong Zhao

Figure 1 for Deep Quality Assessment of Compressed Videos: A Subjective and Objective Study

Figure 2 for Deep Quality Assessment of Compressed Videos: A Subjective and Objective Study

Figure 3 for Deep Quality Assessment of Compressed Videos: A Subjective and Objective Study

Figure 4 for Deep Quality Assessment of Compressed Videos: A Subjective and Objective Study

Abstract:In the video coding process, the perceived quality of a compressed video is evaluated by full-reference quality evaluation metrics. However, it is difficult to obtain reference videos with perfect quality. To solve this problem, it is critical to design no-reference compressed video quality assessment algorithms, which assists in measuring the quality of experience on the server side and resource allocation on the network side. Convolutional Neural Network (CNN) has shown its advantage in Video Quality Assessment (VQA) with promising successes in recent years. A large-scale quality database is very important for learning accurate and powerful compressed video quality metrics. In this work, a semi-automatic labeling method is adopted to build a large-scale compressed video quality database, which allows us to label a large number of compressed videos with manageable human workload. The resulting Compressed Video quality database with Semi-Automatic Ratings (CVSAR), so far the largest of compressed video quality database. We train a no-reference compressed video quality assessment model with a 3D CNN for SpatioTemporal Feature Extraction and Evaluation (STFEE). Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics and achieves promising generalization performance in cross-database tests. The CVSAR database and STFEE model will be made publicly available to facilitate reproducible research.

* 11 pages, 8 figures

Via

Access Paper or Ask Questions

Multi-View Video Coding with GAN Latent Learning

May 07, 2022

Chengdong Lan, Cheng Luo, Hao Yan, Tiesong Zhao, Sam Kwong

Figure 1 for Multi-View Video Coding with GAN Latent Learning

Figure 2 for Multi-View Video Coding with GAN Latent Learning

Figure 3 for Multi-View Video Coding with GAN Latent Learning

Figure 4 for Multi-View Video Coding with GAN Latent Learning

Abstract:The introduction of multiple viewpoints inevitably increases the bitrates to store and transmit video scenes. To reduce the compressed bitrates, researchers have developed to skip intermediate viewpoints during compression and delivery, and finally reconstruct them with Side Information (SI). Generally, the depth maps can be utilized to construct SI; however, it shows inferior performance with inaccurate reconstruction or high bitrates. In this paper, we propose a multi-view video coding based on SI of Generative Adversarial Network (GAN). At the encoder, we construct a spatio-temporal Epipolar Plane Image (EPI) and further utilize convolutional network to extract the latent code of GAN as SI; while at the decoder side, we combine the SI and adjacent viewpoints to reconstruct intermediate views by the generator of GAN. In particular, we set a joint encoder constraint of reconstruction cost and SI entropy, in order to achieve an optimal tradeoff between reconstruction quality and bitrate overhead. Experiments show a significantly improved Rate-Distortion (RD) performance compared with the state-of-the-art methods.

Via

Access Paper or Ask Questions