Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:UniCon: Unified Context Network for Robust Active Speaker Detection

Aug 05, 2021

Yuanhang Zhang, Susan Liang, Shuang Yang, Xiao Liu, Zhongqin Wu, Shiguang Shan, Xilin Chen

Figure 1 for UniCon: Unified Context Network for Robust Active Speaker Detection

Figure 2 for UniCon: Unified Context Network for Robust Active Speaker Detection

Figure 3 for UniCon: Unified Context Network for Robust Active Speaker Detection

Figure 4 for UniCon: Unified Context Network for Robust Active Speaker Detection

Share this with someone who'll enjoy it:

Abstract:We introduce a new efficient framework, the Unified Context Network (UniCon), for robust active speaker detection (ASD). Traditional methods for ASD usually operate on each candidate's pre-cropped face track separately and do not sufficiently consider the relationships among the candidates. This potentially limits performance, especially in challenging scenarios with low-resolution faces, multiple candidates, etc. Our solution is a novel, unified framework that focuses on jointly modeling multiple types of contextual information: spatial context to indicate the position and scale of each candidate's face, relational context to capture the visual relationships among the candidates and contrast audio-visual affinities with each other, and temporal context to aggregate long-term information and smooth out local uncertainties. Based on such information, our model optimizes all candidates in a unified process for robust and reliable ASD. A thorough ablation study is performed on several challenging ASD benchmarks under different settings. In particular, our method outperforms the state-of-the-art by a large margin of about 15% mean Average Precision (mAP) absolute on two challenging subsets: one with three candidate speakers, and the other with faces smaller than 64 pixels. Together, our UniCon achieves 92.0% mAP on the AVA-ActiveSpeaker validation set, surpassing 90% for the first time on this challenging dataset at the time of submission. Project website: https://unicon-asd.github.io/.

* 10 pages, 6 figures; to appear at ACM Multimedia 2021

View paper on

Share this with someone who'll enjoy it:

Title:UniCon: Unified Context Network for Robust Active Speaker Detection

Paper and Code