Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention

Jun 17, 2021

Efthymios Tzinis, Scott Wisdom, Tal Remez, John R. Hershey

Figure 1 for Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention

Figure 2 for Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention

Figure 3 for Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention

Figure 4 for Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention

Share this with someone who'll enjoy it:

Abstract:We introduce a state-of-the-art audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos. We identify limitations of previous work on audiovisual on-screen sound separation, including the simplicity and coarse resolution of spatio-temporal attention, and poor convergence of the audio separation model. Our proposed model addresses these issues using cross-modal and self-attention modules that capture audio-visual dependencies at a finer resolution over time, and by unsupervised pre-training of audio separation model. These improvements allow the model to generalize to a much wider set of unseen videos. For evaluation and semi-supervised training, we collected human annotations of on-screen audio from a large database of in-the-wild videos (YFCC100M). Our results show marked improvements in on-screen separation performance, in more general conditions than previous methods.

View paper on

Share this with someone who'll enjoy it:

Title:Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention

Paper and Code