Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Rethinking RGB-D Salient Object Detection: Models, Datasets, and Large-Scale Benchmarks

Jul 15, 2019

Deng-Ping Fan, Zheng Lin, Jia-Xing Zhao, Yun Liu, Zhao Zhang, Qibin Hou, Menglong Zhu, Ming-Ming Cheng

Figure 1 for Rethinking RGB-D Salient Object Detection: Models, Datasets, and Large-Scale Benchmarks

Figure 2 for Rethinking RGB-D Salient Object Detection: Models, Datasets, and Large-Scale Benchmarks

Figure 3 for Rethinking RGB-D Salient Object Detection: Models, Datasets, and Large-Scale Benchmarks

Figure 4 for Rethinking RGB-D Salient Object Detection: Models, Datasets, and Large-Scale Benchmarks

Share this with someone who'll enjoy it:

Abstract:The use of RGB-D information for salient object detection has been explored in recent years. However, relatively few efforts have been spent in modeling salient object detection over real-world human activity scenes with RGB-D. In this work, we fill the gap by making the following contributions to RGB-D salient object detection. First, we carefully collect a new salient person (SIP) dataset, which consists of 1K high-resolution images that cover diverse real-world scenes from various viewpoints, poses, occlusion, illumination, and background. Second, we conduct a large-scale and so far the most comprehensive benchmark comparing contemporary methods, which has long been missing in the area and can serve as a baseline for future research. We systematically summarized 31 popular models, evaluated 17 state-of-the-art methods over seven datasets with totally about 91K images. Third, we propose a simple baseline architecture, called Deep Depth-Depurator Network (D3Net). It consists of a depth depurator unit and a feature learning module, performing initial low-quality depth map filtering and cross-modal feature learning respectively. These components form a nested structure and are elaborately designed to be learned jointly. D3Net exceeds the performance of any prior contenders across five metrics considered, thus serves as a strong baseline to advance the research frontier. We also demonstrate that D3Net can be used to efficiently extract salient person masks from the real scenes, enabling effective background changed book cover application with 20 fps on a single GPU. All the saliency maps, our new SIP dataset, baseline model, and evaluation tools are made publicly available at https://github.com/DengPingFan/D3NetBenchmark.

* 15 pages

View paper on

Share this with someone who'll enjoy it:

Title:Rethinking RGB-D Salient Object Detection: Models, Datasets, and Large-Scale Benchmarks

Paper and Code