Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation

Jun 26, 2022

Jinxiang Liu, Chen Ju, Weidi Xie, Ya Zhang

Figure 1 for Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation

Figure 2 for Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation

Figure 3 for Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation

Figure 4 for Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation

Share this with someone who'll enjoy it:

Abstract:We present a simple yet effective self-supervised framework for audio-visual representation learning, to localize the sound source in videos. To understand what enables to learn useful representations, we systematically investigate the effects of data augmentations, and reveal that (1) composition of data augmentations plays a critical role, {\em i.e.}~explicitly encouraging the audio-visual representations to be invariant to various transformations~({\em transformation invariance}); (2) enforcing geometric consistency substantially improves the quality of learned representations, {\em i.e.}~the detected sound source should follow the same transformation applied on input video frames~({\em transformation equivariance}). Extensive experiments demonstrate that our model significantly outperforms previous methods on two sound localization benchmarks, namely, Flickr-SoundNet and VGG-Sound. Additionally, we also evaluate audio retrieval and cross-modal retrieval tasks. In both cases, our self-supervised models demonstrate superior retrieval performances, even competitive with the supervised approach in audio retrieval. This reveals the proposed framework learns strong multi-modal representations that are beneficial to sound localisation and generalization to further applications. \textit{All codes will be available}.

* 10 pages,

View paper on

Share this with someone who'll enjoy it:

Title:Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation

Paper and Code