Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Weakly-supervised Audio-visual Sound Source Detection and Separation

Mar 25, 2021

Tanzila Rahman, Leonid Sigal

Figure 1 for Weakly-supervised Audio-visual Sound Source Detection and Separation

Figure 2 for Weakly-supervised Audio-visual Sound Source Detection and Separation

Figure 3 for Weakly-supervised Audio-visual Sound Source Detection and Separation

Figure 4 for Weakly-supervised Audio-visual Sound Source Detection and Separation

Share this with someone who'll enjoy it:

Abstract:Learning how to localize and separate individual object sounds in the audio channel of the video is a difficult task. Current state-of-the-art methods predict audio masks from artificially mixed spectrograms, known as Mix-and-Separate framework. We propose an audio-visual co-segmentation, where the network learns both what individual objects look and sound like, from videos labeled with only object labels. Unlike other recent visually-guided audio source separation frameworks, our architecture can be learned in an end-to-end manner and requires no additional supervision or bounding box proposals. Specifically, we introduce weakly-supervised object segmentation in the context of sound separation. We also formulate spectrogram mask prediction using a set of learned mask bases, which combine using coefficients conditioned on the output of object segmentation , a design that facilitates separation. Extensive experiments on the MUSIC dataset show that our proposed approach outperforms state-of-the-art methods on visually guided sound source separation and sound denoising.

* IEEE International Conference on Multimedia and Expo (ICME) 2021 * 4 figures, 6 pages

View paper on

Share this with someone who'll enjoy it:

Title:Weakly-supervised Audio-visual Sound Source Detection and Separation

Paper and Code