Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization

Aug 09, 2023

Tianyu Liu, Peng Zhang, Wei Huang, Yufei Zha, Tao You, Yanning Zhang

Figure 1 for Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization

Figure 2 for Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization

Figure 3 for Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization

Figure 4 for Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization

Share this with someone who'll enjoy it:

Abstract:Self-supervised sound source localization is usually challenged by the modality inconsistency. In recent studies, contrastive learning based strategies have shown promising to establish such a consistent correspondence between audio and sound sources in visual scenarios. Unfortunately, the insufficient attention to the heterogeneity influence in the different modality features still limits this scheme to be further improved, which also becomes the motivation of our work. In this study, an Induction Network is proposed to bridge the modality gap more effectively. By decoupling the gradients of visual and audio modalities, the discriminative visual representations of sound sources can be learned with the designed Induction Vector in a bootstrap manner, which also enables the audio modality to be aligned with the visual modality consistently. In addition to a visual weighted contrastive loss, an adaptive threshold selection strategy is introduced to enhance the robustness of the Induction Network. Substantial experiments conducted on SoundNet-Flickr and VGG-Sound Source datasets have demonstrated a superior performance compared to other state-of-the-art works in different challenging scenarios. The code is available at https://github.com/Tahy1/AVIN

* Accepted to ACM Multimedia 2023

View paper on

Share this with someone who'll enjoy it:

Title:Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization

Paper and Code