Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Temporal Synchronicity

Nov 14, 2021

Pritam Sarkar, Ali Etemad

Figure 1 for Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Temporal Synchronicity

Figure 2 for Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Temporal Synchronicity

Figure 3 for Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Temporal Synchronicity

Figure 4 for Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Temporal Synchronicity

Share this with someone who'll enjoy it:

Abstract:We present CrissCross, a self-supervised framework for learning audio-visual representations. A novel notion is introduced in our framework whereby in addition to learning the intra-modal and standard 'synchronous' cross-modal relations, CrissCross also learns 'asynchronous' cross-modal relationships. We show that by relaxing the temporal synchronicity between the audio and visual modalities, the network learns strong time-invariant representations. Our experiments show that strong augmentations for both audio and visual modalities with relaxation of cross-modal temporal synchronicity optimize performance. To pretrain our proposed framework, we use 3 different datasets with varying sizes, Kinetics-Sound, Kinetics-400, and AudioSet. The learned representations are evaluated on a number of downstream tasks namely action recognition, sound classification, and retrieval. CrissCross shows state-of-the-art performances on action recognition (UCF101 and HMDB51) and sound classification (ESC50). The codes and pretrained models will be made publicly available.

* 18 pages

View paper on

Share this with someone who'll enjoy it:

Title:Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Temporal Synchronicity

Paper and Code