Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation

May 06, 2023

Bolin Lai, Fiona Ryan, Wenqi Jia, Miao Liu, James M. Rehg

Figure 1 for Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation

Figure 2 for Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation

Figure 3 for Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation

Figure 4 for Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation

Share this with someone who'll enjoy it:

Abstract:Egocentric gaze anticipation serves as a key building block for the emerging capability of Augmented Reality. Notably, gaze behavior is driven by both visual cues and audio signals during daily activities. Motivated by this observation, we introduce the first model that leverages both the video and audio modalities for egocentric gaze anticipation. Specifically, we propose a Contrastive Spatial-Temporal Separable (CSTS) fusion approach that adopts two modules to separately capture audio-visual correlations in spatial and temporal dimensions, and applies a contrastive loss on the re-weighted audio-visual features from fusion modules for representation learning. We conduct extensive ablation studies and thorough analysis using two egocentric video datasets: Ego4D and Aria, to validate our model design. We also demonstrate improvements over prior state-of-the-art methods. Moreover, we provide visualizations to show the gaze anticipation results and provide additional insights into audio-visual representation learning.

* 16 pages

View paper on

Share this with someone who'll enjoy it:

Title:Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation

Paper and Code