Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Sequential Contrastive Audio-Visual Learning

Jul 08, 2024

Ioannis Tsiamas, Santiago Pascual, Chunghsin Yeh, Joan Serrà

Figure 1 for Sequential Contrastive Audio-Visual Learning

Figure 2 for Sequential Contrastive Audio-Visual Learning

Figure 3 for Sequential Contrastive Audio-Visual Learning

Figure 4 for Sequential Contrastive Audio-Visual Learning

Share this with someone who'll enjoy it:

Abstract:Contrastive learning has emerged as a powerful technique in audio-visual representation learning, leveraging the natural co-occurrence of audio and visual modalities in extensive web-scale video datasets to achieve significant advancements. However, conventional contrastive audio-visual learning methodologies often rely on aggregated representations derived through temporal aggregation, which neglects the intrinsic sequential nature of the data. This oversight raises concerns regarding the ability of standard approaches to capture and utilize fine-grained information within sequences, information that is vital for distinguishing between semantically similar yet distinct examples. In response to this limitation, we propose sequential contrastive audio-visual learning (SCAV), which contrasts examples based on their non-aggregated representation space using sequential distances. Retrieval experiments with the VGGSound and Music datasets demonstrate the effectiveness of SCAV, showing 2-3x relative improvements against traditional aggregation-based contrastive learning and other methods from the literature. We also show that models trained with SCAV exhibit a high degree of flexibility regarding the metric employed for retrieval, allowing them to operate on a spectrum of efficiency-accuracy trade-offs, potentially making them applicable in multiple scenarios, from small- to large-scale retrieval.

View paper on

Share this with someone who'll enjoy it:

Title:Sequential Contrastive Audio-Visual Learning

Paper and Code