Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhiying Jiang

DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution

Mar 03, 2025

Xingyuan Li, Zirui Wang, Yang Zou, Zhixin Chen, Jun Ma, Zhiying Jiang, Long Ma, Jinyuan Liu

Abstract:Infrared imaging is essential for autonomous driving and robotic operations as a supportive modality due to its reliable performance in challenging environments. Despite its popularity, the limitations of infrared cameras, such as low spatial resolution and complex degradations, consistently challenge imaging quality and subsequent visual tasks. Hence, infrared image super-resolution (IISR) has been developed to address this challenge. While recent developments in diffusion models have greatly advanced this field, current methods to solve it either ignore the unique modal characteristics of infrared imaging or overlook the machine perception requirements. To bridge these gaps, we propose DifIISR, an infrared image super-resolution diffusion model optimized for visual quality and perceptual performance. Our approach achieves task-based guidance for diffusion by injecting gradients derived from visual and perceptual priors into the noise during the reverse process. Specifically, we introduce an infrared thermal spectrum distribution regulation to preserve visual fidelity, ensuring that the reconstructed infrared images closely align with high-resolution images by matching their frequency components. Subsequently, we incorporate various visual foundational models as the perceptual guidance for downstream visual tasks, infusing generalizable perceptual features beneficial for detection and segmentation. As a result, our approach gains superior visual results while attaining State-Of-The-Art downstream task performance. Code is available at https://github.com/zirui0625/DifIISR

* This paper was accepted by CVPR 2025

Via

Access Paper or Ask Questions

Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption

Jan 18, 2025

Jinyuan Liu, Guanyao Wu, Zhu Liu, Di Wang, Zhiying Jiang, Long Ma, Wei Zhong, Xin Fan, Risheng Liu

Abstract:Infrared-visible image fusion (IVIF) is a critical task in computer vision, aimed at integrating the unique features of both infrared and visible spectra into a unified representation. Since 2018, the field has entered the deep learning era, with an increasing variety of approaches introducing a range of networks and loss functions to enhance visual performance. However, challenges such as data compatibility, perception accuracy, and efficiency remain. Unfortunately, there is a lack of recent comprehensive surveys that address this rapidly expanding domain. This paper fills that gap by providing a thorough survey covering a broad range of topics. We introduce a multi-dimensional framework to elucidate common learning-based IVIF methods, from visual enhancement strategies to data compatibility and task adaptability. We also present a detailed analysis of these approaches, accompanied by a lookup table clarifying their core ideas. Furthermore, we summarize performance comparisons, both quantitatively and qualitatively, focusing on registration, fusion, and subsequent high-level tasks. Beyond technical analysis, we discuss potential future directions and open issues in this area. For further details, visit our GitHub repository: https://github.com/RollingPlain/IVIF_ZOO.

Via

Access Paper or Ask Questions

SeFENet: Robust Deep Homography Estimation via Semantic-Driven Feature Enhancement

Dec 09, 2024

Zeru Shi, Zengxi Zhang, Zhiying Jiang, Ruizhe An, Jinyuan Liu

Figure 1 for SeFENet: Robust Deep Homography Estimation via Semantic-Driven Feature Enhancement

Figure 2 for SeFENet: Robust Deep Homography Estimation via Semantic-Driven Feature Enhancement

Figure 3 for SeFENet: Robust Deep Homography Estimation via Semantic-Driven Feature Enhancement

Figure 4 for SeFENet: Robust Deep Homography Estimation via Semantic-Driven Feature Enhancement

Abstract:Images captured in harsh environments often exhibit blurred details, reduced contrast, and color distortion, which hinder feature detection and matching, thereby affecting the accuracy and robustness of homography estimation. While visual enhancement can improve contrast and clarity, it may introduce visual-tolerant artifacts that obscure the structural integrity of images. Considering the resilience of semantic information against environmental interference, we propose a semantic-driven feature enhancement network for robust homography estimation, dubbed SeFENet. Concretely, we first introduce an innovative hierarchical scale-aware module to expand the receptive field by aggregating multi-scale information, thereby effectively extracting image features under diverse harsh conditions. Subsequently, we propose a semantic-guided constraint module combined with a high-level perceptual framework to achieve degradation-tolerant with semantic feature. A meta-learning-based training strategy is introduced to mitigate the disparity between semantic and structural features. By internal-external alternating optimization, the proposed network achieves implicit semantic-wise feature enhancement, thereby improving the robustness of homography estimation in adverse environments by strengthening the local feature comprehension and context information extraction. Experimental results under both normal and harsh conditions demonstrate that SeFENet significantly outperforms SOTA methods, reducing point match error by at least 41\% on the large-scale datasets.

Via

Access Paper or Ask Questions

HUPE: Heuristic Underwater Perceptual Enhancement with Semantic Collaborative Learning

Nov 29, 2024

Zengxi Zhang, Zhiying Jiang, Long Ma, Jinyuan Liu, Xin Fan, Risheng Liu

Abstract:Underwater images are often affected by light refraction and absorption, reducing visibility and interfering with subsequent applications. Existing underwater image enhancement methods primarily focus on improving visual quality while overlooking practical implications. To strike a balance between visual quality and application, we propose a heuristic invertible network for underwater perception enhancement, dubbed HUPE, which enhances visual quality and demonstrates flexibility in handling other downstream tasks. Specifically, we introduced an information-preserving reversible transformation with embedded Fourier transform to establish a bidirectional mapping between underwater images and their clear images. Additionally, a heuristic prior is incorporated into the enhancement process to better capture scene information. To further bridge the feature gap between vision-based enhancement images and application-oriented images, a semantic collaborative learning module is applied in the joint optimization process of the visual enhancement task and the downstream task, which guides the proposed enhancement model to extract more task-oriented semantic features while obtaining visually pleasing images. Extensive experiments, both quantitative and qualitative, demonstrate the superiority of our HUPE over state-of-the-art methods. The source code is available at https://github.com/ZengxiZhang/HUPE.

* 22 pages, 21 figures

Via

Access Paper or Ask Questions

Towards Robust Image Stitching: An Adaptive Resistance Learning against Compatible Attacks

Feb 25, 2024

Zhiying Jiang, Xingyuan Li, Jinyuan Liu, Xin Fan, Risheng Liu

Abstract:Image stitching seamlessly integrates images captured from varying perspectives into a single wide field-of-view image. Such integration not only broadens the captured scene but also augments holistic perception in computer vision applications. Given a pair of captured images, subtle perturbations and distortions which go unnoticed by the human visual system tend to attack the correspondence matching, impairing the performance of image stitching algorithms. In light of this challenge, this paper presents the first attempt to improve the robustness of image stitching against adversarial attacks. Specifically, we introduce a stitching-oriented attack~(SoA), tailored to amplify the alignment loss within overlapping regions, thereby targeting the feature matching procedure. To establish an attack resistant model, we delve into the robustness of stitching architecture and develop an adaptive adversarial training~(AAT) to balance attack resistance with stitching precision. In this way, we relieve the gap between the routine adversarial training and benign models, ensuring resilience without quality compromise. Comprehensive evaluation across real-world and synthetic datasets validate the deterioration of SoA on stitching performance. Furthermore, AAT emerges as a more robust solution against adversarial perturbations, delivering superior stitching results. Code is available at:https://github.com/Jzy2017/TRIS.

* Accepted by AAAI2024

Via

Access Paper or Ask Questions

From Text to Pixels: A Context-Aware Semantic Synergy Solution for Infrared and Visible Image Fusion

Dec 31, 2023

Xingyuan Li, Yang Zou, Jinyuan Liu, Zhiying Jiang, Long Ma, Xin Fan, Risheng Liu

Figure 1 for From Text to Pixels: A Context-Aware Semantic Synergy Solution for Infrared and Visible Image Fusion

Figure 2 for From Text to Pixels: A Context-Aware Semantic Synergy Solution for Infrared and Visible Image Fusion

Figure 3 for From Text to Pixels: A Context-Aware Semantic Synergy Solution for Infrared and Visible Image Fusion

Figure 4 for From Text to Pixels: A Context-Aware Semantic Synergy Solution for Infrared and Visible Image Fusion

Abstract:With the rapid progression of deep learning technologies, multi-modality image fusion has become increasingly prevalent in object detection tasks. Despite its popularity, the inherent disparities in how different sources depict scene content make fusion a challenging problem. Current fusion methodologies identify shared characteristics between the two modalities and integrate them within this shared domain using either iterative optimization or deep learning architectures, which often neglect the intricate semantic relationships between modalities, resulting in a superficial understanding of inter-modal connections and, consequently, suboptimal fusion outcomes. To address this, we introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images. This method capitalizes on the complementary characteristics of diverse modalities, bolstering both the accuracy and robustness of object detection. The codebook is utilized to enhance a streamlined and concise depiction of the fused intra- and inter-domain dynamics, fine-tuned for optimal performance in detection tasks. We present a bilevel optimization strategy that establishes a nexus between the joint problem of fusion and detection, optimizing both processes concurrently. Furthermore, we introduce the first dataset of paired infrared and visible images accompanied by text prompts, paving the way for future research. Extensive experiments on several datasets demonstrate that our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.

* 10 pages, 12 figures, 3 tables, conference

Via

Access Paper or Ask Questions

Dual Adversarial Resilience for Collaborating Robust Underwater Image Enhancement and Perception

Sep 03, 2023

Zengxi Zhang, Zhiying Jiang, Zeru Shi, Jinyuan Liu, Risheng Liu

Figure 1 for Dual Adversarial Resilience for Collaborating Robust Underwater Image Enhancement and Perception

Figure 2 for Dual Adversarial Resilience for Collaborating Robust Underwater Image Enhancement and Perception

Figure 3 for Dual Adversarial Resilience for Collaborating Robust Underwater Image Enhancement and Perception

Figure 4 for Dual Adversarial Resilience for Collaborating Robust Underwater Image Enhancement and Perception

Abstract:Due to the uneven scattering and absorption of different light wavelengths in aquatic environments, underwater images suffer from low visibility and clear color deviations. With the advancement of autonomous underwater vehicles, extensive research has been conducted on learning-based underwater enhancement algorithms. These works can generate visually pleasing enhanced images and mitigate the adverse effects of degraded images on subsequent perception tasks. However, learning-based methods are susceptible to the inherent fragility of adversarial attacks, causing significant disruption in results. In this work, we introduce a collaborative adversarial resilience network, dubbed CARNet, for underwater image enhancement and subsequent detection tasks. Concretely, we first introduce an invertible network with strong perturbation-perceptual abilities to isolate attacks from underwater images, preventing interference with image enhancement and perceptual tasks. Furthermore, we propose a synchronized attack training strategy with both visual-driven and perception-driven attacks enabling the network to discern and remove various types of attacks. Additionally, we incorporate an attack pattern discriminator to heighten the robustness of the network against different attacks. Extensive experiments demonstrate that the proposed method outputs visually appealing enhancement images and perform averagely 6.71% higher detection mAP than state-of-the-art methods.

* 9 pages, 9 figures

Via

Access Paper or Ask Questions

Holistic Dynamic Frequency Transformer for Image Fusion and Exposure Correction

Sep 03, 2023

Xiaoke Shang, Gehui Li, Zhiying Jiang, Shaomin Zhang, Nai Ding, Jinyuan Liu

Abstract:The correction of exposure-related issues is a pivotal component in enhancing the quality of images, offering substantial implications for various computer vision tasks. Historically, most methodologies have predominantly utilized spatial domain recovery, offering limited consideration to the potentialities of the frequency domain. Additionally, there has been a lack of a unified perspective towards low-light enhancement, exposure correction, and multi-exposure fusion, complicating and impeding the optimization of image processing. In response to these challenges, this paper proposes a novel methodology that leverages the frequency domain to improve and unify the handling of exposure correction tasks. Our method introduces Holistic Frequency Attention and Dynamic Frequency Feed-Forward Network, which replace conventional correlation computation in the spatial-domain. They form a foundational building block that facilitates a U-shaped Holistic Dynamic Frequency Transformer as a filter to extract global information and dynamically select important frequency bands for image restoration. Complementing this, we employ a Laplacian pyramid to decompose images into distinct frequency bands, followed by multiple restorers, each tuned to recover specific frequency-band information. The pyramid fusion allows a more detailed and nuanced image restoration process. Ultimately, our structure unifies the three tasks of low-light enhancement, exposure correction, and multi-exposure fusion, enabling comprehensive treatment of all classical exposure errors. Benchmarking on mainstream datasets for these tasks, our proposed method achieves state-of-the-art results, paving the way for more sophisticated and unified solutions in exposure correction.

Via

Access Paper or Ask Questions

Fearless Luminance Adaptation: A Macro-Micro-Hierarchical Transformer for Exposure Correction

Sep 02, 2023

Gehui Li, Jinyuan Liu, Long Ma, Zhiying Jiang, Xin Fan, Risheng Liu

Figure 1 for Fearless Luminance Adaptation: A Macro-Micro-Hierarchical Transformer for Exposure Correction

Figure 2 for Fearless Luminance Adaptation: A Macro-Micro-Hierarchical Transformer for Exposure Correction

Figure 3 for Fearless Luminance Adaptation: A Macro-Micro-Hierarchical Transformer for Exposure Correction

Figure 4 for Fearless Luminance Adaptation: A Macro-Micro-Hierarchical Transformer for Exposure Correction

Abstract:Photographs taken with less-than-ideal exposure settings often display poor visual quality. Since the correction procedures vary significantly, it is difficult for a single neural network to handle all exposure problems. Moreover, the inherent limitations of convolutions, hinder the models ability to restore faithful color or details on extremely over-/under- exposed regions. To overcome these limitations, we propose a Macro-Micro-Hierarchical transformer, which consists of a macro attention to capture long-range dependencies, a micro attention to extract local features, and a hierarchical structure for coarse-to-fine correction. In specific, the complementary macro-micro attention designs enhance locality while allowing global interactions. The hierarchical structure enables the network to correct exposure errors of different scales layer by layer. Furthermore, we propose a contrast constraint and couple it seamlessly in the loss function, where the corrected image is pulled towards the positive sample and pushed away from the dynamically generated negative samples. Thus the remaining color distortion and loss of detail can be removed. We also extend our method as an image enhancer for low-light face recognition and low-light semantic segmentation. Experiments demonstrate that our approach obtains more attractive results than state-of-the-art methods quantitatively and qualitatively.

* Accepted by ACM MM 2023

Via

Access Paper or Ask Questions

Approximating Human-Like Few-shot Learning with GPT-based Compression

Aug 14, 2023

Cynthia Huang, Yuqing Xie, Zhiying Jiang, Jimmy Lin, Ming Li

Abstract:In this work, we conceptualize the learning process as information compression. We seek to equip generative pre-trained models with human-like learning capabilities that enable data compression during inference. We present a novel approach that utilizes the Generative Pre-trained Transformer (GPT) to approximate Kolmogorov complexity, with the aim of estimating the optimal Information Distance for few-shot learning. We first propose using GPT as a prior for lossless text compression, achieving a noteworthy compression ratio. Experiment with LLAMA2-7B backbone achieves a compression ratio of 15.5 on enwik9. We justify the pre-training objective of GPT models by demonstrating its equivalence to the compression length, and, consequently, its ability to approximate the information distance for texts. Leveraging the approximated information distance, our method allows the direct application of GPT models in quantitative text similarity measurements. Experiment results show that our method overall achieves superior performance compared to embedding and prompt baselines on challenging NLP tasks, including semantic similarity, zero and one-shot text classification, and zero-shot text ranking.

Via

Access Paper or Ask Questions