Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Comprehensive Multi-Modal Interactions for Referring Image Segmentation

Apr 21, 2021

Kanishk Jain, Vineet Gandhi

Figure 1 for Comprehensive Multi-Modal Interactions for Referring Image Segmentation

Figure 2 for Comprehensive Multi-Modal Interactions for Referring Image Segmentation

Figure 3 for Comprehensive Multi-Modal Interactions for Referring Image Segmentation

Figure 4 for Comprehensive Multi-Modal Interactions for Referring Image Segmentation

Share this with someone who'll enjoy it:

Abstract:We investigate Referring Image Segmentation (RIS), which outputs a segmentation map corresponding to the given natural language description. To solve RIS efficiently, we need to understand each word's relationship with other words, each region in the image to other regions, and cross-modal alignment between linguistic and visual domains. Recent methods model these three types of interactions sequentially. We argue that such a modular approach limits these methods' performance, and joint simultaneous reasoning can help resolve ambiguities. To this end, we propose a Joint Reasoning (JRM) module and a novel Cross-Modal Multi-Level Fusion (CMMLF) module for tackling this task. JRM effectively models the referent's multi-modal context by jointly reasoning over visual and linguistic modalities (performing word-word, image region-region, word-region interactions in a single module). CMMLF module further refines the segmentation masks by exchanging contextual information across visual hierarchy through linguistic features acting as a bridge. We present thorough ablation studies and validate our approach's performance on four benchmark datasets, and show that the proposed method outperforms the existing state-of-the-art methods on all four datasets by significant margins.

View paper on

Share this with someone who'll enjoy it:

Title:Comprehensive Multi-Modal Interactions for Referring Image Segmentation

Paper and Code