Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Audio xLSTMs: Learning Self-Supervised Audio Representations with xLSTMs

Sep 02, 2024

Sarthak Yadav, Sergios Theodoridis, Zheng-Hua Tan

Figure 1 for Audio xLSTMs: Learning Self-Supervised Audio Representations with xLSTMs

Figure 2 for Audio xLSTMs: Learning Self-Supervised Audio Representations with xLSTMs

Figure 3 for Audio xLSTMs: Learning Self-Supervised Audio Representations with xLSTMs

Figure 4 for Audio xLSTMs: Learning Self-Supervised Audio Representations with xLSTMs

Share this with someone who'll enjoy it:

Abstract:While the transformer has emerged as the eminent neural architecture, several independent lines of research have emerged to address its limitations. Recurrent neural approaches have also observed a lot of renewed interest, including the extended long short-term memory (xLSTM) architecture, which reinvigorates the original LSTM architecture. However, while xLSTMs have shown competitive performance compared to the transformer, their viability for learning self-supervised general-purpose audio representations has not yet been evaluated. This work proposes Audio xLSTM (AxLSTM), an approach to learn audio representations from masked spectrogram patches in a self-supervised setting. Pretrained on the AudioSet dataset, the proposed AxLSTM models outperform comparable self-supervised audio spectrogram transformer (SSAST) baselines by up to 20% in relative performance across a set of ten diverse downstream tasks while having up to 45% fewer parameters.

* Under review at ICASSP 2025. arXiv admin note: text overlap with arXiv:2406.02178

View paper on

Share this with someone who'll enjoy it:

Title:Audio xLSTMs: Learning Self-Supervised Audio Representations with xLSTMs

Paper and Code