Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

Dec 20, 2023

Wenqi Jia, Miao Liu, Hao Jiang, Ishwarya Ananthabhotla, James M. Rehg, Vamsi Krishna Ithapu, Ruohan Gao

Figure 1 for The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

Figure 2 for The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

Figure 3 for The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

Figure 4 for The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

Share this with someone who'll enjoy it:

Abstract:In recent years, the thriving development of research related to egocentric videos has provided a unique perspective for the study of conversational interactions, where both visual and audio signals play a crucial role. While most prior work focus on learning about behaviors that directly involve the camera wearer, we introduce the Ego-Exocentric Conversational Graph Prediction problem, marking the first attempt to infer exocentric conversational interactions from egocentric videos. We propose a unified multi-modal, multi-task framework -- Audio-Visual Conversational Attention (Av-CONV), for the joint prediction of conversation behaviors -- speaking and listening -- for both the camera wearer as well as all other social partners present in the egocentric video. Specifically, we customize the self-attention mechanism to model the representations across-time, across-subjects, and across-modalities. To validate our method, we conduct experiments on a challenging egocentric video dataset that includes first-person perspective, multi-speaker, and multi-conversation scenarios. Our results demonstrate the superior performance of our method compared to a series of baselines. We also present detailed ablation studies to assess the contribution of each component in our model. Project page: https://vjwq.github.io/AV-CONV/.

View paper on

Share this with someone who'll enjoy it:

Title:The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

Paper and Code