Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion Recognition

Aug 14, 2024

Mohamed Osman, Daniel Z. Kaplan, Tamer Nadeem

Figure 1 for SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion Recognition

Figure 2 for SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion Recognition

Figure 3 for SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion Recognition

Figure 4 for SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion Recognition

Share this with someone who'll enjoy it:

Abstract:Speech emotion recognition (SER) has made significant strides with the advent of powerful self-supervised learning (SSL) models. However, the generalization of these models to diverse languages and emotional expressions remains a challenge. We propose a large-scale benchmark to evaluate the robustness and adaptability of state-of-the-art SER models in both in-domain and out-of-domain settings. Our benchmark includes a diverse set of multilingual datasets, focusing on less commonly used corpora to assess generalization to new data. We employ logit adjustment to account for varying class distributions and establish a single dataset cluster for systematic evaluation. Surprisingly, we find that the Whisper model, primarily designed for automatic speech recognition, outperforms dedicated SSL models in cross-lingual SER. Our results highlight the need for more robust and generalizable SER models, and our benchmark serves as a valuable resource to drive future research in this direction.

* Accepted at INTERSPEECH 2024

View paper on

Share this with someone who'll enjoy it:

Title:SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion Recognition

Paper and Code