Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sneha Das

Exploring Local Interpretable Model-Agnostic Explanations for Speech Emotion Recognition with Distribution-Shift

Apr 07, 2025

Maja J. Hjuler, Line H. Clemmensen, Sneha Das

Abstract:We introduce EmoLIME, a version of local interpretable model-agnostic explanations (LIME) for black-box Speech Emotion Recognition (SER) models. To the best of our knowledge, this is the first attempt to apply LIME in SER. EmoLIME generates high-level interpretable explanations and identifies which specific frequency ranges are most influential in determining emotional states. The approach aids in interpreting complex, high-dimensional embeddings such as those generated by end-to-end speech models. We evaluate EmoLIME, qualitatively, quantitatively, and statistically, across three emotional speech datasets, using classifiers trained on both hand-crafted acoustic features and Wav2Vec 2.0 embeddings. We find that EmoLIME exhibits stronger robustness across different models than across datasets with distribution shifts, highlighting its potential for more consistent explanations in SER tasks within a dataset.

* Published in the proceedings of ICASSP 2025

Via

Access Paper or Ask Questions

Is it the model or the metric -- On robustness measures of deeplearning models

Dec 13, 2024

Zhijin Lyu, Yutong Jin, Sneha Das

Abstract:Determining the robustness of deep learning models is an established and ongoing challenge within automated decision-making systems. With the advent and success of techniques that enable advanced deep learning (DL), these models are being used in widespread applications, including high-stake ones like healthcare, education, border-control. Therefore, it is critical to understand the limitations of these models and predict their regions of failures, in order to create the necessary guardrails for their successful and safe deployment. In this work, we revisit robustness, specifically investigating the sufficiency of robust accuracy (RA), within the context of deepfake detection. We present robust ratio (RR) as a complementary metric, that can quantify the changes to the normalized or probability outcomes under input perturbation. We present a comparison of RA and RR and demonstrate that despite similar RA between models, the models show varying RR under different tolerance (perturbation) levels.

* Extended abstract at Northern Lights Deep Learning (NLDL) Conference 2025

Via

Access Paper or Ask Questions

Examining the Interplay Between Privacy and Fairness for Speech Processing: A Review and Perspective

Sep 05, 2024

Anna Leschanowsky, Sneha Das

Abstract:Speech technology has been increasingly deployed in various areas of daily life including sensitive domains such as healthcare and law enforcement. For these technologies to be effective, they must work reliably for all users while preserving individual privacy. Although tradeoffs between privacy and utility, as well as fairness and utility, have been extensively researched, the specific interplay between privacy and fairness in speech processing remains underexplored. This review and position paper offers an overview of emerging privacy-fairness tradeoffs throughout the entire machine learning lifecycle for speech processing. By drawing on well-established frameworks on fairness and privacy, we examine existing biases and sources of privacy harm that coexist during the development of speech processing models. We then highlight how corresponding privacy-enhancing technologies have the potential to inadvertently increase these biases and how bias mitigation strategies may conversely reduce privacy. By raising open questions, we advocate for a comprehensive evaluation of privacy-fairness tradeoffs for speech technology and the development of privacy-enhancing and fairness-aware algorithms in this domain.

Via

Access Paper or Ask Questions

Evaluation of Large Language Models: STEM education and Gender Stereotypes

Jun 14, 2024

Smilla Due, Sneha Das, Marianne Andersen, Berta Plandolit López, Sniff Andersen Nexø, Line Clemmensen

Abstract:Large Language Models (LLMs) have an increasing impact on our lives with use cases such as chatbots, study support, coding support, ideation, writing assistance, and more. Previous studies have revealed linguistic biases in pronouns used to describe professions or adjectives used to describe men vs women. These issues have to some degree been addressed in updated LLM versions, at least to pass existing tests. However, biases may still be present in the models, and repeated use of gender stereotypical language may reinforce the underlying assumptions and are therefore important to examine further. This paper investigates gender biases in LLMs in relation to educational choices through an open-ended, true to user-case experimental design and a quantitative analysis. We investigate the biases in the context of four different cultures, languages, and educational systems (English/US/UK, Danish/DK, Catalan/ES, and Hindi/IN) for ages ranging from 10 to 16 years, corresponding to important educational transition points in the different countries. We find that there are significant and large differences in the ratio of STEM to non-STEM suggested education paths provided by chatGPT when using typical girl vs boy names to prompt lists of suggested things to become. There are generally fewer STEM suggestions in the Danish, Spanish, and Indian context compared to the English. We also find subtle differences in the suggested professions, which we categorise and report.

Via

Access Paper or Ask Questions

Exploratory Evaluation of Speech Content Masking

Jan 08, 2024

Jennifer Williams, Karla Pizzi, Paul-Gauthier Noe, Sneha Das

Abstract:Most recent speech privacy efforts have focused on anonymizing acoustic speaker attributes but there has not been as much research into protecting information from speech content. We introduce a toy problem that explores an emerging type of privacy called "content masking" which conceals selected words and phrases in speech. In our efforts to define this problem space, we evaluate an introductory baseline masking technique based on modifying sequences of discrete phone representations (phone codes) produced from a pre-trained vector-quantized variational autoencoder (VQ-VAE) and re-synthesized using WaveRNN. We investigate three different masking locations and three types of masking strategies: noise substitution, word deletion, and phone sequence reversal. Our work attempts to characterize how masking affects two downstream tasks: automatic speech recognition (ASR) and automatic speaker verification (ASV). We observe how the different masks types and locations impact these downstream tasks and discuss how these issues may influence privacy goals.

* Accepted to ITG Speech Conference 2023

Via

Access Paper or Ask Questions

On Crowdsourcing-design with Comparison Category Rating for Evaluating Speech Enhancement Algorithms

Jun 02, 2023

Angélica S. Z. Suárez, Clément Laroche, Line H. Clemmensen, Sneha Das

Abstract:Speech enhancement techniques improve the quality or the intelligibility of an audio signal by removing unwanted noise. It is used as preprocessing in numerous applications such as speech recognition, hearing aids, broadcasting and telephony. The evaluation of such algorithms often relies on reference-based objective metrics that are shown to correlate poorly with human perception. In order to evaluate audio quality as perceived by human observers it is thus fundamental to resort to subjective quality assessment. In this paper, a user evaluation based on crowdsourcing (subjective) and the Comparison Category Rating (CCR) method is compared against the DNSMOS, ViSQOL and 3QUEST (objective) metrics. The overall quality scores of three speech enhancement algorithms from real time communications (RTC) are used in the comparison using the P.808 toolkit. Results indicate that while the CCR scale allows participants to identify differences between processed and unprocessed audio samples, two groups of preferences emerge: some users rate positively by focusing on noise suppression processing, while others rate negatively by focusing mainly on speech quality. We further present results on the parameters, size considerations and speaker variations that are critical and should be considered when designing the CCR-based crowdsourcing evaluation.

* Published at ICASSP 2023

Via

Access Paper or Ask Questions

Pre-processing Blood-Volume-Pulse for In-the-wild Applications

Apr 27, 2023

Laurits Fromberg, Sneha Das, Line Katrine Harder Clemmensen

Abstract:Blood-volume-pulse (BVP) is a biosignal commonly used in applications for non-invasive affect recognition and wearable technology. However, its predisposition to noise constitutes limitations for its application in real-life settings. This paper revisits BVP processing and proposes standard practices for feature extraction from empirical observations of BVP. We propose a method for improving the use of features in the presence of noise and compare it to a standard signal processing approach of a 4th order Butterworth bandpass filter with cut-off frequencies of 1 Hz and 8 Hz. Our method achieves better results for most time features as well as for a subset of the frequency features. We find that all but one time feature and around half of the frequency features perform better when the noisy parts are known (best case). When the noisy parts are unknown and estimated using a metric of skewness, the proposed method in general works better or similar to the Butterworth bandpass filter, but both methods also fail for a subset features. Our results can be used to select BVP features that are meaningful under different SNR conditions.

* Submitted to Eusipco 2023

Via

Access Paper or Ask Questions

Interpretability by design using computer vision for behavioral sensing in child and adolescent psychiatry

Jul 11, 2022

Flavia D. Frumosu, Nicole N. Lønfeldt, A. -R. Cecilie Mora-Jensen, Sneha Das, Nicklas Leander Lund, A. Katrine Pagsberg, Line K. H. Clemmensen

Figure 1 for Interpretability by design using computer vision for behavioral sensing in child and adolescent psychiatry

Figure 2 for Interpretability by design using computer vision for behavioral sensing in child and adolescent psychiatry

Figure 3 for Interpretability by design using computer vision for behavioral sensing in child and adolescent psychiatry

Figure 4 for Interpretability by design using computer vision for behavioral sensing in child and adolescent psychiatry

Abstract:Observation is an essential tool for understanding and studying human behavior and mental states. However, coding human behavior is a time-consuming, expensive task, in which reliability can be difficult to achieve and bias is a risk. Machine learning (ML) methods offer ways to improve reliability, decrease cost, and scale up behavioral coding for application in clinical and research settings. Here, we use computer vision to derive behavioral codes or concepts of a gold standard behavioral rating system, offering familiar interpretation for mental health professionals. Features were extracted from videos of clinical diagnostic interviews of children and adolescents with and without obsessive-compulsive disorder. Our computationally-derived ratings were comparable to human expert ratings for negative emotions, activity-level/arousal and anxiety. For the attention and positive affect concepts, our ML ratings performed reasonably. However, results for gaze and vocalization indicate a need for improved data quality or additional data modalities.

* Presented at 2nd Workshop on Interpretable Machine Learning in Healthcare (IMLH) - International Conference on Machine Learning (ICML) 2022

Via

Access Paper or Ask Questions

Computational behavior recognition in child and adolescent psychiatry: A statistical and machine learning analysis plan

May 11, 2022

Nicole N. Lønfeldt, Flavia D. Frumosu, A. -R. Cecilie Mora-Jensen, Nicklas Leander Lund, Sneha Das, A. Katrine Pagsberg, Line K. H. Clemmensen

Figure 1 for Computational behavior recognition in child and adolescent psychiatry: A statistical and machine learning analysis plan

Figure 2 for Computational behavior recognition in child and adolescent psychiatry: A statistical and machine learning analysis plan

Abstract:Motivation: Behavioral observations are an important resource in the study and evaluation of psychological phenomena, but it is costly, time-consuming, and susceptible to bias. Thus, we aim to automate coding of human behavior for use in psychotherapy and research with the help of artificial intelligence (AI) tools. Here, we present an analysis plan. Methods: Videos of a gold-standard semi-structured diagnostic interview of 25 youth with obsessive-compulsive disorder (OCD) and 12 youth without a psychiatric diagnosis (no-OCD) will be analyzed. Youth were between 8 and 17 years old. Features from the videos will be extracted and used to compute ratings of behavior, which will be compared to ratings of behavior produced by mental health professionals trained to use a specific behavioral coding manual. We will test the effect of OCD diagnosis on the computationally-derived behavior ratings using multivariate analysis of variance (MANOVA). Using the generated features, a binary classification model will be built and used to classify OCD/no-OCD classes. Discussion: Here, we present a pre-defined plan for how data will be pre-processed, analyzed and presented in the publication of results and their interpretation. A challenge for the proposed study is that the AI approach will attempt to derive behavioral ratings based solely on vision, whereas humans use visual, paralinguistic and linguistic cues to rate behavior. Another challenge will be using machine learning models for body and facial movement detection trained primarily on adults and not on children. If the AI tools show promising results, this pre-registered analysis plan may help reduce interpretation bias. Trial registration: ClinicalTrials.gov - H-18010607

* 7 pages, 1 figure

Via

Access Paper or Ask Questions

Speech Detection For Child-Clinician Conversations In Danish For Low-Resource In-The-Wild Conditions: A Case Study

Apr 25, 2022

Sneha Das, Nicole Nadine Lønfeldt, Anne Katrine Pagsberg, Line. H. Clemmensen

Figure 1 for Speech Detection For Child-Clinician Conversations In Danish For Low-Resource In-The-Wild Conditions: A Case Study

Figure 2 for Speech Detection For Child-Clinician Conversations In Danish For Low-Resource In-The-Wild Conditions: A Case Study

Figure 3 for Speech Detection For Child-Clinician Conversations In Danish For Low-Resource In-The-Wild Conditions: A Case Study

Figure 4 for Speech Detection For Child-Clinician Conversations In Danish For Low-Resource In-The-Wild Conditions: A Case Study

Abstract:Use of speech models for automatic speech processing tasks can improve efficiency in the screening, analysis, diagnosis and treatment in medicine and psychiatry. However, the performance of pre-processing speech tasks like segmentation and diarization can drop considerably on in-the-wild clinical data, specifically when the target dataset comprises of atypical speech. In this paper we study the performance of a pre-trained speech model on a dataset comprising of child-clinician conversations in Danish with respect to the classification threshold. Since we do not have access to sufficient labelled data, we propose few-instance threshold adaptation, wherein we employ the first minutes of the speech conversation to obtain the optimum classification threshold. Through our work in this paper, we learned that the model with default classification threshold performs worse on children from the patient group. Furthermore, the error rates of the model is directly correlated to the severity of diagnosis in the patients. Lastly, our study on few-instance adaptation shows that three-minutes of clinician-child conversation is sufficient to obtain the optimum classification threshold.

* 5 pages. Submitted to Interspeech 2022

Via

Access Paper or Ask Questions