Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yingxia Jiao

Guidance and Teaching Network for Video Salient Object Detection

Jun 06, 2021

Yingxia Jiao, Xiao Wang, Yu-Cheng Chou, Shouyuan Yang, Ge-Peng Ji, Rong Zhu, Ge Gao

Figure 1 for Guidance and Teaching Network for Video Salient Object Detection

Figure 2 for Guidance and Teaching Network for Video Salient Object Detection

Figure 3 for Guidance and Teaching Network for Video Salient Object Detection

Figure 4 for Guidance and Teaching Network for Video Salient Object Detection

Abstract:Owing to the difficulties of mining spatial-temporal cues, the existing approaches for video salient object detection (VSOD) are limited in understanding complex and noisy scenarios, and often fail in inferring prominent objects. To alleviate such shortcomings, we propose a simple yet efficient architecture, termed Guidance and Teaching Network (GTNet), to independently distil effective spatial and temporal cues with implicit guidance and explicit teaching at feature- and decision-level, respectively. To be specific, we (a) introduce a temporal modulator to implicitly bridge features from motion into the appearance branch, which is capable of fusing cross-modal features collaboratively, and (b) utilise motion-guided mask to propagate the explicit cues during the feature aggregation. This novel learning strategy achieves satisfactory results via decoupling the complex spatial-temporal cues and mapping informative cues across different modalities. Extensive experiments on three challenging benchmarks show that the proposed method can run at ~28 fps on a single TITAN Xp GPU and perform competitively against 14 cutting-edge baselines.

* Accepted at IEEE ICIP 2021

Via

Access Paper or Ask Questions