Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation

Jun 25, 2024

Xuming Zhang, Naoto Yokoya, Xingfa Gu, Qingjiu Tian, Lorenzo Bruzzone

Figure 1 for Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation

Figure 2 for Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation

Figure 3 for Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation

Figure 4 for Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation

Share this with someone who'll enjoy it:

Abstract:Hyperspectral image (HSI) classification has recently reached its performance bottleneck. Multimodal data fusion is emerging as a promising approach to overcome this bottleneck by providing rich complementary information from the supplementary modality (X-modality). However, achieving comprehensive cross-modal interaction and fusion that can be generalized across different sensing modalities is challenging due to the disparity in imaging sensors, resolution, and content of different modalities. In this study, we propose a Local-to-Global Cross-modal Attention-aware Fusion (LoGoCAF) framework for HSI-X classification that jointly considers efficiency, accuracy, and generalizability. LoGoCAF adopts a pixel-to-pixel two-branch semantic segmentation architecture to learn information from HSI and X modalities. The pipeline of LoGoCAF consists of a local-to-global encoder and a lightweight multilayer perceptron (MLP) decoder. In the encoder, convolutions are used to encode local and high-resolution fine details in shallow layers, while transformers are used to integrate global and low-resolution coarse features in deeper layers. The MLP decoder aggregates information from the encoder for feature fusion and prediction. In particular, two cross-modality modules, the feature enhancement module (FEM) and the feature interaction and fusion module (FIFM), are introduced in each encoder stage. The FEM is used to enhance complementary information by combining the feature from the other modality across direction-aware, position-sensitive, and channel-wise dimensions. With the enhanced features, the FIFM is designed to promote cross-modality information interaction and fusion for the final semantic prediction. Extensive experiments demonstrate that our LoGoCAF achieves superior performance and generalizes well. The code will be made publicly available.

View paper on

Share this with someone who'll enjoy it:

Title:Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation

Paper and Code