Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Hierarchical Cross-modal Transformer for RGB-D Salient Object Detection

Feb 16, 2023

Hao Chen, Feihong Shen

Figure 1 for Hierarchical Cross-modal Transformer for RGB-D Salient Object Detection

Figure 2 for Hierarchical Cross-modal Transformer for RGB-D Salient Object Detection

Figure 3 for Hierarchical Cross-modal Transformer for RGB-D Salient Object Detection

Figure 4 for Hierarchical Cross-modal Transformer for RGB-D Salient Object Detection

Share this with someone who'll enjoy it:

Abstract:Most of existing RGB-D salient object detection (SOD) methods follow the CNN-based paradigm, which is unable to model long-range dependencies across space and modalities due to the natural locality of CNNs. Here we propose the Hierarchical Cross-modal Transformer (HCT), a new multi-modal transformer, to tackle this problem. Unlike previous multi-modal transformers that directly connecting all patches from two modalities, we explore the cross-modal complementarity hierarchically to respect the modality gap and spatial discrepancy in unaligned regions. Specifically, we propose to use intra-modal self-attention to explore complementary global contexts, and measure spatial-aligned inter-modal attention locally to capture cross-modal correlations. In addition, we present a Feature Pyramid module for Transformer (FPT) to boost informative cross-scale integration as well as a consistency-complementarity module to disentangle the multi-modal integration path and improve the fusion adaptivity. Comprehensive experiments on a large variety of public datasets verify the efficacy of our designs and the consistent improvement over state-of-the-art models.

* 10 pages, 10 figures

View paper on

Share this with someone who'll enjoy it:

Title:Hierarchical Cross-modal Transformer for RGB-D Salient Object Detection

Paper and Code