Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition

Nov 14, 2023

Xiaohan Shi, Jiajun He, Xingfeng Li, Tomoki Toda

Share this with someone who'll enjoy it:

Abstract:This paper proposes an efficient attempt to noisy speech emotion recognition (NSER). Conventional NSER approaches have proven effective in mitigating the impact of artificial noise sources, such as white Gaussian noise, but are limited to non-stationary noises in real-world environments due to their complexity and uncertainty. To overcome this limitation, we introduce a new method for NSER by adopting the automatic speech recognition (ASR) model as a noise-robust feature extractor to eliminate non-vocal information in noisy speech. We first obtain intermediate layer information from the ASR model as a feature representation for emotional speech and then apply this representation for the downstream NSER task. Our experimental results show that 1) the proposed method achieves better NSER performance compared with the conventional noise reduction method, 2) outperforms self-supervised learning approaches, and 3) even outperforms text-based approaches using ASR transcription or the ground truth transcription of noisy speech.

* Submitted to ICASSP 2024

View paper on

Share this with someone who'll enjoy it:

Title:On the Effectiveness of ASR Representations in Real-world Noisy Speech Emotion Recognition

Paper and Code