Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shu-Jie Chen

SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning

Jul 11, 2024

Runmin Zhang, Jun Ma, Si-Yuan Cao, Lun Luo, Beinan Yu, Shu-Jie Chen, Junwei Li, Hui-Liang Shen

Figure 1 for SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning

Figure 2 for SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning

Figure 3 for SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning

Figure 4 for SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning

Abstract:We propose a novel unsupervised cross-modal homography estimation framework based on intra-modal Self-supervised learning, Correlation, and consistent feature map Projection, namely SCPNet. The concept of intra-modal self-supervised learning is first presented to facilitate the unsupervised cross-modal homography estimation. The correlation-based homography estimation network and the consistent feature map projection are combined to form the learnable architecture of SCPNet, boosting the unsupervised learning framework. SCPNet is the first to achieve effective unsupervised homography estimation on the satellite-map image pair cross-modal dataset, GoogleMap, under [-32,+32] offset on a 128x128 image, leading the supervised approach MHN by 14.0% of mean average corner error (MACE). We further conduct extensive experiments on several cross-modal/spectral and manually-made inconsistent datasets, on which SCPNet achieves the state-of-the-art (SOTA) performance among unsupervised approaches, and owns 49.0%, 25.2%, 36.4%, and 10.7% lower MACEs than the supervised approach MHN. Source code is available at https://github.com/RM-Zhang/SCPNet.

* Accepted by ECCV 2024

Via

Access Paper or Ask Questions

SGDFormer: One-stage Transformer-based Architecture for Cross-Spectral Stereo Image Guided Denoising

Mar 30, 2024

Runmin Zhang, Zhu Yu, Zehua Sheng, Jiacheng Ying, Si-Yuan Cao, Shu-Jie Chen, Bailin Yang, Junwei Li, Hui-Liang Shen

Figure 1 for SGDFormer: One-stage Transformer-based Architecture for Cross-Spectral Stereo Image Guided Denoising

Figure 2 for SGDFormer: One-stage Transformer-based Architecture for Cross-Spectral Stereo Image Guided Denoising

Figure 3 for SGDFormer: One-stage Transformer-based Architecture for Cross-Spectral Stereo Image Guided Denoising

Figure 4 for SGDFormer: One-stage Transformer-based Architecture for Cross-Spectral Stereo Image Guided Denoising

Abstract:Cross-spectral image guided denoising has shown its great potential in recovering clean images with rich details, such as using the near-infrared image to guide the denoising process of the visible one. To obtain such image pairs, a feasible and economical way is to employ a stereo system, which is widely used on mobile devices. Current works attempt to generate an aligned guidance image to handle the disparity between two images. However, due to occlusion, spectral differences and noise degradation, the aligned guidance image generally exists ghosting and artifacts, leading to an unsatisfactory denoised result. To address this issue, we propose a one-stage transformer-based architecture, named SGDFormer, for cross-spectral Stereo image Guided Denoising. The architecture integrates the correspondence modeling and feature fusion of stereo images into a unified network. Our transformer block contains a noise-robust cross-attention (NRCA) module and a spatially variant feature fusion (SVFF) module. The NRCA module captures the long-range correspondence of two images in a coarse-to-fine manner to alleviate the interference of noise. The SVFF module further enhances salient structures and suppresses harmful artifacts through dynamically selecting useful information. Thanks to the above design, our SGDFormer can restore artifact-free images with fine structures, and achieves state-of-the-art performance on various datasets. Additionally, our SGDFormer can be extended to handle other unaligned cross-model guided restoration tasks such as guided depth super-resolution.

Via

Access Paper or Ask Questions

PCNet: A Structure Similarity Enhancement Method for Multispectral and Multimodal Image Registration

Jun 09, 2021

Si-Yuan Cao, Hui-Liang Shen, Lun Luo, Shu-Jie Chen, Chunguang Li

Figure 1 for PCNet: A Structure Similarity Enhancement Method for Multispectral and Multimodal Image Registration

Figure 2 for PCNet: A Structure Similarity Enhancement Method for Multispectral and Multimodal Image Registration

Figure 3 for PCNet: A Structure Similarity Enhancement Method for Multispectral and Multimodal Image Registration

Figure 4 for PCNet: A Structure Similarity Enhancement Method for Multispectral and Multimodal Image Registration

Abstract:Multispectral and multimodal image processing is important in the community of computer vision and computational photography. As the acquired multispectral and multimodal data are generally misaligned due to the alternation or movement of the image device, the image registration procedure is necessary. The registration of multispectral or multimodal image is challenging due to the non-linear intensity and gradient variation. To cope with this challenge, we propose the phase congruency network (PCNet), which is able to enhance the structure similarity and alleviate the non-linear intensity and gradient variation. The images can then be aligned using the similarity enhanced features produced by the network. PCNet is constructed under the guidance of the phase congruency prior. The network contains three trainable layers accompany with the modified learnable Gabor kernels according to the phase congruency theory. Thanks to the prior knowledge, PCNet is extremely light-weight and can be trained on quite a small amount of multispectral data. PCNet can be viewed to be fully convolutional and hence can take input of arbitrary sizes. Once trained, PCNet is applicable on a variety of multispectral and multimodal data such as RGB/NIR and flash/no-flash images without additional further tuning. Experimental results validate that PCNet outperforms current state-of-the-art registration algorithms, including the deep-learning based ones that have the number of parameters hundreds times compared to PCNet. Thanks to the similarity enhancement training, PCNet outperforms the original phase congruency algorithm with two-thirds less feature channels.

* 13 pages, 14 figures

Via

Access Paper or Ask Questions

Normalized Total Gradient: A New Measure for Multispectral Image Registration

Feb 15, 2017

Shu-Jie Chen, Hui-Liang Shen

Figure 1 for Normalized Total Gradient: A New Measure for Multispectral Image Registration

Figure 2 for Normalized Total Gradient: A New Measure for Multispectral Image Registration

Figure 3 for Normalized Total Gradient: A New Measure for Multispectral Image Registration

Figure 4 for Normalized Total Gradient: A New Measure for Multispectral Image Registration

Abstract:Image registration is a fundamental issue in multispectral image processing. In filter wheel based multispectral imaging systems, the non-coplanar placement of the filters always causes the misalignment of multiple channel images. The selective characteristic of spectral response in multispectral imaging raises two challenges to image registration. First, the intensity levels of a local region may be different in individual channel images. Second, the local intensity may vary rapidly in some channel images while keeps stationary in others. Conventional multimodal measures, such as mutual information, correlation coefficient, and correlation ratio, can register images with different regional intensity levels, but will fail in the circumstance of severe local intensity variation. In this paper, a new measure, namely normalized total gradient (NTG), is proposed for multispectral image registration. The NTG is applied on the difference between two channel images. This measure is based on the key assumption (observation) that the gradient of difference image between two aligned channel images is sparser than that between two misaligned ones. A registration framework, which incorporates image pyramid and global/local optimization, is further introduced for rigid transform. Experimental results validate that the proposed method is effective for multispectral image registration and performs better than conventional methods.

* 12 pages, 11 figures

Via

Access Paper or Ask Questions