Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:TACO: Training-free Sound Prompted Segmentation via Deep Audio-visual CO-factorization

Dec 02, 2024

Hugo Malard, Michel Olvera, Stephane Lathuiliere, Slim Essid

Figure 1 for TACO: Training-free Sound Prompted Segmentation via Deep Audio-visual CO-factorization

Figure 2 for TACO: Training-free Sound Prompted Segmentation via Deep Audio-visual CO-factorization

Figure 3 for TACO: Training-free Sound Prompted Segmentation via Deep Audio-visual CO-factorization

Figure 4 for TACO: Training-free Sound Prompted Segmentation via Deep Audio-visual CO-factorization

Share this with someone who'll enjoy it:

Abstract:Large-scale pre-trained audio and image models demonstrate an unprecedented degree of generalization, making them suitable for a wide range of applications. Here, we tackle the specific task of sound-prompted segmentation, aiming to segment image regions corresponding to objects heard in an audio signal. Most existing approaches tackle this problem by fine-tuning pre-trained models or by training additional modules specifically for the task. We adopt a different strategy: we introduce a training-free approach that leverages Non-negative Matrix Factorization (NMF) to co-factorize audio and visual features from pre-trained models to reveal shared interpretable concepts. These concepts are passed to an open-vocabulary segmentation model for precise segmentation maps. By using frozen pre-trained models, our method achieves high generalization and establishes state-of-the-art performance in unsupervised sound-prompted segmentation, significantly surpassing previous unsupervised methods.

View paper on

Share this with someone who'll enjoy it:

Title:TACO: Training-free Sound Prompted Segmentation via Deep Audio-visual CO-factorization

Paper and Code