Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SODAWideNet++: Combining Attention and Convolutions for Salient Object Detection

Aug 29, 2024

Rohit Venkata Sai Dulam, Chandra Kambhamettu

Figure 1 for SODAWideNet++: Combining Attention and Convolutions for Salient Object Detection

Figure 2 for SODAWideNet++: Combining Attention and Convolutions for Salient Object Detection

Figure 3 for SODAWideNet++: Combining Attention and Convolutions for Salient Object Detection

Figure 4 for SODAWideNet++: Combining Attention and Convolutions for Salient Object Detection

Share this with someone who'll enjoy it:

Abstract:Salient Object Detection (SOD) has traditionally relied on feature refinement modules that utilize the features of an ImageNet pre-trained backbone. However, this approach limits the possibility of pre-training the entire network because of the distinct nature of SOD and image classification. Additionally, the architecture of these backbones originally built for Image classification is sub-optimal for a dense prediction task like SOD. To address these issues, we propose a novel encoder-decoder-style neural network called SODAWideNet++ that is designed explicitly for SOD. Inspired by the vision transformers ability to attain a global receptive field from the initial stages, we introduce the Attention Guided Long Range Feature Extraction (AGLRFE) module, which combines large dilated convolutions and self-attention. Specifically, we use attention features to guide long-range information extracted by multiple dilated convolutions, thus taking advantage of the inductive biases of a convolution operation and the input dependency brought by self-attention. In contrast to the current paradigm of ImageNet pre-training, we modify 118K annotated images from the COCO semantic segmentation dataset by binarizing the annotations to pre-train the proposed model end-to-end. Further, we supervise the background predictions along with the foreground to push our model to generate accurate saliency predictions. SODAWideNet++ performs competitively on five different datasets while only containing 35% of the trainable parameters compared to the state-of-the-art models. The code and pre-computed saliency maps are provided at https://github.com/VimsLab/SODAWideNetPlusPlus.

* Accepted at ICPR 2024

View paper on

Share this with someone who'll enjoy it:

Title:SODAWideNet++: Combining Attention and Convolutions for Salient Object Detection

Paper and Code