Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Oct 12, 2020

Di Hu, Rui Qian, Minyue Jiang, Xiao Tan, Shilei Wen, Errui Ding, Weiyao Lin, Dejing Dou

Figure 1 for Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Figure 2 for Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Figure 3 for Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Figure 4 for Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Share this with someone who'll enjoy it:

Abstract:Discriminatively localizing sounding objects in cocktail-party, i.e., mixed sound scenes, is commonplace for humans, but still challenging for machines. In this paper, we propose a two-stage learning framework to perform self-supervised class-aware sounding object localization. First, we propose to learn robust object representations by aggregating the candidate sound localization results in the single source scenes. Then, class-aware object localization maps are generated in the cocktail-party scenarios by referring the pre-learned object knowledge, and the sounding objects are accordingly selected by matching audio and visual object category distributions, where the audiovisual consistency is viewed as the self-supervised signal. Experimental results in both realistic and synthesized cocktail-party videos demonstrate that our model is superior in filtering out silent objects and pointing out the location of sounding objects of different classes. Code is available at https://github.com/DTaoo/Discriminative-Sounding-Objects-Localization.

* To appear in NeurIPS 2020. Previous Title: Learning to Discriminatively Localize Sounding Objects in a Cocktail-party Scenario

View paper on

Share this with someone who'll enjoy it:

Title:Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Paper and Code