Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Brian Eoff

LanSER: Language-Model Supported Speech Emotion Recognition

Sep 07, 2023

Taesik Gong, Josh Belanich, Krishna Somandepalli, Arsha Nagrani, Brian Eoff, Brendan Jou

Figure 1 for LanSER: Language-Model Supported Speech Emotion Recognition

Figure 2 for LanSER: Language-Model Supported Speech Emotion Recognition

Figure 3 for LanSER: Language-Model Supported Speech Emotion Recognition

Figure 4 for LanSER: Language-Model Supported Speech Emotion Recognition

Abstract:Speech emotion recognition (SER) models typically rely on costly human-labeled data for training, making scaling methods to large speech datasets and nuanced emotion taxonomies difficult. We present LanSER, a method that enables the use of unlabeled data by inferring weak emotion labels via pre-trained large language models through weakly-supervised learning. For inferring weak labels constrained to a taxonomy, we use a textual entailment approach that selects an emotion label with the highest entailment score for a speech transcript extracted via automatic speech recognition. Our experimental results show that models pre-trained on large datasets with this weak supervision outperform other baseline models on standard SER datasets when fine-tuned, and show improved label efficiency. Despite being pre-trained on labels derived only from text, we show that the resulting representations appear to model the prosodic content of speech.

* INTERSPEECH (2023) 2408-2412
* Presented at INTERSPEECH 2023

Via

Access Paper or Ask Questions

Multitask vocal burst modeling with ResNets and pre-trained paralinguistic Conformers

Jun 24, 2022

Josh Belanich, Krishna Somandepalli, Brian Eoff, Brendan Jou

Figure 1 for Multitask vocal burst modeling with ResNets and pre-trained paralinguistic Conformers

Figure 2 for Multitask vocal burst modeling with ResNets and pre-trained paralinguistic Conformers

Abstract:This technical report presents the modeling approaches used in our submission to the ICML Expressive Vocalizations Workshop & Competition multitask track (ExVo-MultiTask). We first applied image classification models of various sizes on mel-spectrogram representations of the vocal bursts, as is standard in sound event detection literature. Results from these models show an increase of 21.24% over the baseline system with respect to the harmonic mean of the task metrics, and comprise our team's main submission to the MultiTask track. We then sought to characterize the headroom in the MultiTask track by applying a large pre-trained Conformer model that previously achieved state-of-the-art results on paralinguistic tasks like speech emotion recognition and mask detection. We additionally investigated the relationship between the sub-tasks of emotional expression, country of origin, and age prediction, and discovered that the best performing models are trained as single-task models, questioning whether the problem truly benefits from a multitask setting.

* To be published in the ICML Expressive Vocalizations Workshop & Competition 2022 (https://www.competitions.hume.ai/exvo2022)

Via

Access Paper or Ask Questions

DISSECT: Disentangled Simultaneous Explanations via Concept Traversals

May 31, 2021

Asma Ghandeharioun, Been Kim, Chun-Liang Li, Brendan Jou, Brian Eoff, Rosalind W. Picard

Figure 1 for DISSECT: Disentangled Simultaneous Explanations via Concept Traversals

Figure 2 for DISSECT: Disentangled Simultaneous Explanations via Concept Traversals

Figure 3 for DISSECT: Disentangled Simultaneous Explanations via Concept Traversals

Figure 4 for DISSECT: Disentangled Simultaneous Explanations via Concept Traversals

Abstract:Explaining deep learning model inferences is a promising venue for scientific understanding, improving safety, uncovering hidden biases, evaluating fairness, and beyond, as argued by many scholars. One of the principal benefits of counterfactual explanations is allowing users to explore "what-if" scenarios through what does not and cannot exist in the data, a quality that many other forms of explanation such as heatmaps and influence functions are inherently incapable of doing. However, most previous work on generative explainability cannot disentangle important concepts effectively, produces unrealistic examples, or fails to retain relevant information. We propose a novel approach, DISSECT, that jointly trains a generator, a discriminator, and a concept disentangler to overcome such challenges using little supervision. DISSECT generates Concept Traversals (CTs), defined as a sequence of generated examples with increasing degrees of concepts that influence a classifier's decision. By training a generative model from a classifier's signal, DISSECT offers a way to discover a classifier's inherent "notion" of distinct concepts automatically rather than rely on user-predefined concepts. We show that DISSECT produces CTs that (1) disentangle several concepts, (2) are influential to a classifier's decision and are coupled to its reasoning due to joint training (3), are realistic, (4) preserve relevant information, and (5) are stable across similar inputs. We validate DISSECT on several challenging synthetic and realistic datasets where previous methods fall short of satisfying desirable criteria for interpretability and show that it performs consistently well and better than existing methods. Finally, we present experiments showing applications of DISSECT for detecting potential biases of a classifier and identifying spurious artifacts that impact predictions.

Via

Access Paper or Ask Questions

Characterizing Sources of Uncertainty to Proxy Calibration and Disambiguate Annotator and Data Bias

Oct 05, 2019

Asma Ghandeharioun, Brian Eoff, Brendan Jou, Rosalind W. Picard

Figure 1 for Characterizing Sources of Uncertainty to Proxy Calibration and Disambiguate Annotator and Data Bias

Figure 2 for Characterizing Sources of Uncertainty to Proxy Calibration and Disambiguate Annotator and Data Bias

Figure 3 for Characterizing Sources of Uncertainty to Proxy Calibration and Disambiguate Annotator and Data Bias

Figure 4 for Characterizing Sources of Uncertainty to Proxy Calibration and Disambiguate Annotator and Data Bias

Abstract:Supporting model interpretability for complex phenomena where annotators can legitimately disagree, such as emotion recognition, is a challenging machine learning task. In this work, we show that explicitly quantifying the uncertainty in such settings has interpretability benefits. We use a simple modification of a classical network inference using Monte Carlo dropout to give measures of epistemic and aleatoric uncertainty. We identify a significant correlation between aleatoric uncertainty and human annotator disagreement ($r\approx.3$). Additionally, we demonstrate how difficult and subjective training samples can be identified using aleatoric uncertainty and how epistemic uncertainty can reveal data bias that could result in unfair predictions. We identify the total uncertainty as a suitable surrogate for model calibration, i.e. the degree we can trust model's predicted confidence. In addition to explainability benefits, we observe modest performance boosts from incorporating model uncertainty.

* Accepted for presentation at 2019 ICCV Workshop on Interpreting and Explaining Visual Artificial Intelligence Models

Via

Access Paper or Ask Questions