Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A vector quantized masked autoencoder for audiovisual speech emotion recognition

May 05, 2023

Samir Sadok, Simon Leglaive, Renaud Séguier

Figure 1 for A vector quantized masked autoencoder for audiovisual speech emotion recognition

Figure 2 for A vector quantized masked autoencoder for audiovisual speech emotion recognition

Figure 3 for A vector quantized masked autoencoder for audiovisual speech emotion recognition

Figure 4 for A vector quantized masked autoencoder for audiovisual speech emotion recognition

Share this with someone who'll enjoy it:

Abstract:While fully-supervised models have been shown to be effective for audiovisual speech emotion recognition (SER), the limited availability of labeled data remains a major challenge in the field. To address this issue, self-supervised learning approaches, such as masked autoencoders (MAEs), have gained popularity as potential solutions. In this paper, we propose the VQ-MAE-AV model, a vector quantized MAE specifically designed for audiovisual speech self-supervised representation learning. Unlike existing multimodal MAEs that rely on the processing of the raw audiovisual speech data, the proposed method employs a self-supervised paradigm based on discrete audio and visual speech representations learned by two pre-trained vector quantized variational autoencoders. Experimental results show that the proposed approach, which is pre-trained on the VoxCeleb2 database and fine-tuned on standard emotional audiovisual speech datasets, outperforms the state-of-the-art audiovisual SER methods.

* 14 pages, 4 figures, https://samsad35.github.io/VQ-MAE-AudioVisual/

View paper on

Share this with someone who'll enjoy it:

Title:A vector quantized masked autoencoder for audiovisual speech emotion recognition

Paper and Code