Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

Mar 25, 2022

Zengjie Song, Yuxi Wang, Junsong Fan, Tieniu Tan, Zhaoxiang Zhang

Figure 1 for Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

Figure 2 for Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

Figure 3 for Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

Figure 4 for Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

Share this with someone who'll enjoy it:

Abstract:Sound source localization in visual scenes aims to localize objects emitting the sound in a given image. Recent works showing impressive localization performance typically rely on the contrastive learning framework. However, the random sampling of negatives, as commonly adopted in these methods, can result in misalignment between audio and visual features and thus inducing ambiguity in localization. In this paper, instead of following previous literature, we propose Self-Supervised Predictive Learning (SSPL), a negative-free method for sound localization via explicit positive mining. Specifically, we first devise a three-stream network to elegantly associate sound source with two augmented views of one corresponding video frame, leading to semantically coherent similarities between audio and visual features. Second, we introduce a novel predictive coding module for audio-visual feature alignment. Such a module assists SSPL to focus on target objects in a progressive manner and effectively lowers the positive-pair learning difficulty. Experiments show surprising results that SSPL outperforms the state-of-the-art approach on two standard sound localization benchmarks. In particular, SSPL achieves significant improvements of 8.6% cIoU and 3.4% AUC on SoundNet-Flickr compared to the previous best. Code is available at: https://github.com/zjsong/SSPL.

* Camera-ready, CVPR 2022. Code: https://github.com/zjsong/SSPL

View paper on

Share this with someone who'll enjoy it:

Title:Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

Paper and Code