Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alaa Nfissi

Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition

Jun 01, 2024

Alaa Nfissi, Wassim Bouachir, Nizar Bouguila, Brian Mishara

Abstract:Speech emotion recognition (SER) has gained significant attention due to its several application fields, such as mental health, education, and human-computer interaction. However, the accuracy of SER systems is hindered by high-dimensional feature sets that may contain irrelevant and redundant information. To overcome this challenge, this study proposes an iterative feature boosting approach for SER that emphasizes feature relevance and explainability to enhance machine learning model performance. Our approach involves meticulous feature selection and analysis to build efficient SER systems. In addressing our main problem through model explainability, we employ a feature evaluation loop with Shapley values to iteratively refine feature sets. This process strikes a balance between model performance and transparency, which enables a comprehensive understanding of the model's predictions. The proposed approach offers several advantages, including the identification and removal of irrelevant and redundant features, leading to a more effective model. Additionally, it promotes explainability, facilitating comprehension of the model's predictions and the identification of crucial features for emotion determination. The effectiveness of the proposed method is validated on the SER benchmarks of the Toronto emotional speech set (TESS), Berlin Database of Emotional Speech (EMO-DB), Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), and Surrey Audio-Visual Expressed Emotion (SAVEE) datasets, outperforming state-of-the-art methods. These results highlight the potential of the proposed technique in developing accurate and explainable SER systems. To the best of our knowledge, this is the first work to incorporate model explainability into an SER framework.

* Applied Intelligence (2024)
* Published in: Springer Nature International Journal of Applied Intelligence (2024)

Via

Access Paper or Ask Questions

Iterative Feature Boosting for Explainable Speech Emotion Recognition

May 31, 2024

Alaa Nfissi, Wassim Bouachir, Nizar Bouguila, Brian Mishara

Figure 1 for Iterative Feature Boosting for Explainable Speech Emotion Recognition

Figure 2 for Iterative Feature Boosting for Explainable Speech Emotion Recognition

Figure 3 for Iterative Feature Boosting for Explainable Speech Emotion Recognition

Figure 4 for Iterative Feature Boosting for Explainable Speech Emotion Recognition

Abstract:In speech emotion recognition (SER), using predefined features without considering their practical importance may lead to high dimensional datasets, including redundant and irrelevant information. Consequently, high-dimensional learning often results in decreasing model accuracy while increasing computational complexity. Our work underlines the importance of carefully considering and analyzing features in order to build efficient SER systems. We present a new supervised SER method based on an efficient feature engineering approach. We pay particular attention to the explainability of results to evaluate feature relevance and refine feature sets. This is performed iteratively through feature evaluation loop, using Shapley values to boost feature selection and improve overall framework performance. Our approach allows thus to balance the benefits between model performance and transparency. The proposed method outperforms human-level performance (HLP) and state-of-the-art machine learning methods in emotion recognition on the TESS dataset.

* 2023 International Conference on Machine Learning and Applications (ICMLA), Jacksonville, FL, USA, 2023, pp. 543-549
* Published in: 2023 International Conference on Machine Learning and Applications (ICMLA)

Via

Access Paper or Ask Questions

Speech Emotion Diarization: Which Emotion Appears When?

Jun 22, 2023

Yingzhi Wang, Mirco Ravanelli, Alaa Nfissi, Alya Yacoubi

Abstract:Speech Emotion Recognition (SER) typically relies on utterance-level solutions. However, emotions conveyed through speech should be considered as discrete speech events with definite temporal boundaries, rather than attributes of the entire utterance. To reflect the fine-grained nature of speech emotions, we propose a new task: Speech Emotion Diarization (SED). Just as Speaker Diarization answers the question of "Who speaks when?", Speech Emotion Diarization answers the question of "Which emotion appears when?". To facilitate the evaluation of the performance and establish a common benchmark for researchers, we introduce the Zaion Emotion Dataset (ZED), an openly accessible speech emotion dataset that includes non-acted emotions recorded in real-life conditions, along with manually-annotated boundaries of emotion segments within the utterance. We provide competitive baselines and open-source the code and the pre-trained models.

Via

Access Paper or Ask Questions