Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning

May 14, 2024

Alain Riou, Stefan Lattner, Gaëtan Hadjeres, Geoffroy Peeters

Figure 1 for Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning

Figure 2 for Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning

Figure 3 for Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning

Figure 4 for Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning

Share this with someone who'll enjoy it:

Abstract:This paper addresses the problem of self-supervised general-purpose audio representation learning. We explore the use of Joint-Embedding Predictive Architectures (JEPA) for this task, which consists of splitting an input mel-spectrogram into two parts (context and target), computing neural representations for each, and training the neural network to predict the target representations from the context representations. We investigate several design choices within this framework and study their influence through extensive experiments by evaluating our models on various audio classification benchmarks, including environmental sounds, speech and music downstream tasks. We focus notably on which part of the input data is used as context or target and show experimentally that it significantly impacts the model's quality. In particular, we notice that some effective design choices in the image domain lead to poor performance on audio, thus highlighting major differences between these two modalities.

* Self-supervision in Audio, Speech and Beyond workshop, IEEE International Conference on Acoustics, Speech, and Signal Processing, 2024

View paper on

Share this with someone who'll enjoy it:

Title:Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning

Paper and Code