Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching

Jan 20, 2025

Yepeng Liu, Zhichao Sun, Baosheng Yu, Yitian Zhao, Bo Du, Yongchao Xu, Jun Cheng

Figure 1 for MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching

Figure 2 for MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching

Figure 3 for MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching

Figure 4 for MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching

Share this with someone who'll enjoy it:

Abstract:Many keypoint detection and description methods have been proposed for image matching or registration. While these methods demonstrate promising performance for single-modality image matching, they often struggle with multimodal data because the descriptors trained on single-modality data tend to lack robustness against the non-linear variations present in multimodal data. Extending such methods to multimodal image matching often requires well-aligned multimodal data to learn modality-invariant descriptors. However, acquiring such data is often costly and impractical in many real-world scenarios. To address this challenge, we propose a modality-invariant feature learning network (MIFNet) to compute modality-invariant features for keypoint descriptions in multimodal image matching using only single-modality training data. Specifically, we propose a novel latent feature aggregation module and a cumulative hybrid aggregation module to enhance the base keypoint descriptors trained on single-modality data by leveraging pre-trained features from Stable Diffusion models. We validate our method with recent keypoint detection and description methods in three multimodal retinal image datasets (CF-FA, CF-OCT, EMA-OCTA) and two remote sensing datasets (Optical-SAR and Optical-NIR). Extensive experiments demonstrate that the proposed MIFNet is able to learn modality-invariant feature for multimodal image matching without accessing the targeted modality and has good zero-shot generalization ability. The source code will be made publicly available.

View paper on

Share this with someone who'll enjoy it:

Title:MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching

Paper and Code