Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams

Jun 21, 2021

Yuanbo Hou, Zhesong Yu, Xia Liang, Xingjian Du, Bilei Zhu, Zejun Ma, Dick Botteldooren

Figure 1 for Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams

Figure 2 for Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams

Figure 3 for Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams

Figure 4 for Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams

Share this with someone who'll enjoy it:

Abstract:Many previous audio-visual voice-related works focus on speech, ignoring the singing voice in the growing number of musical video streams on the Internet. For processing diverse musical video data, voice activity detection is a necessary step. This paper attempts to detect the speech and singing voices of target performers in musical video streams using audiovisual information. To integrate information of audio and visual modalities, a multi-branch network is proposed to learn audio and image representations, and the representations are fused by attention based on semantic similarity to shape the acoustic representations through the probability of anchor vocalization. Experiments show the proposed audio-visual multi-branch network far outperforms the audio-only model in challenging acoustic environments, indicating the cross-modal information fusion based on semantic correlation is sensible and successful.

* Accepted by INTERSPEECH 2021

View paper on

Share this with someone who'll enjoy it:

Title:Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams

Paper and Code