Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Medha Hira

Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning

Jun 13, 2024

Arnav Goel, Medha Hira, Anubha Gupta

Figure 1 for Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning

Figure 2 for Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning

Figure 3 for Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning

Figure 4 for Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning

Abstract:Advent of modern deep learning techniques has given rise to advancements in the field of Speech Emotion Recognition (SER). However, most systems prevalent in the field fail to generalize to speakers not seen during training. This study focuses on handling challenges of multilingual SER, specifically on unseen speakers. We introduce CAMuLeNet, a novel architecture leveraging co-attention based fusion and multitask learning to address this problem. Additionally, we benchmark pretrained encoders of Whisper, HuBERT, Wav2Vec2.0, and WavLM using 10-fold leave-speaker-out cross-validation on five existing multilingual benchmark datasets: IEMOCAP, RAVDESS, CREMA-D, EmoDB and CaFE and, release a novel dataset for SER on the Hindi language (BhavVani). CAMuLeNet shows an average improvement of approximately 8% over all benchmarks on unseen speakers determined by our cross-validation strategy.

* 5 pages, Accepted to INTERSPEECH 2024

Via

Access Paper or Ask Questions

Multilingual Prosody Transfer: Comparing Supervised & Transfer Learning

May 23, 2024

Arnav Goel, Medha Hira, Anubha Gupta

Abstract:The field of prosody transfer in speech synthesis systems is rapidly advancing. This research is focused on evaluating learning methods for adapting pre-trained monolingual text-to-speech (TTS) models to multilingual conditions, i.e., Supervised Fine-Tuning (SFT) and Transfer Learning (TL). This comparison utilizes three distinct metrics: Mean Opinion Score (MOS), Recognition Accuracy (RA), and Mel Cepstral Distortion (MCD). Results demonstrate that, in comparison to SFT, TL leads to significantly enhanced performance, with an average MOS higher by 1.53 points, a 37.5% increase in RA, and approximately a 7.8-point improvement in MCD. These findings are instrumental in helping build TTS models for low-resource languages.

* 7 pages, Accepted to ICLR 2024 - Tiny Track

Via

Access Paper or Ask Questions

CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning

May 23, 2024

Medha Hira, Arnav Goel, Anubha Gupta

Figure 1 for CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning

Figure 2 for CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning

Figure 3 for CrossVoice: Crosslingual Prosody Preserving Cascade-S2ST using Transfer Learning

Abstract:This paper presents CrossVoice, a novel cascade-based Speech-to-Speech Translation (S2ST) system employing advanced ASR, MT, and TTS technologies with cross-lingual prosody preservation through transfer learning. We conducted comprehensive experiments comparing CrossVoice with direct-S2ST systems, showing improved BLEU scores on tasks such as Fisher Es-En, VoxPopuli Fr-En and prosody preservation on benchmark datasets CVSS-T and IndicTTS. With an average mean opinion score of 3.75 out of 4, speech synthesized by CrossVoice closely rivals human speech on the benchmark, highlighting the efficacy of cascade-based systems and transfer learning in multilingual S2ST with prosody transfer.

* 8 pages, Accepted at ICLR 2024 - Tiny Track

Via

Access Paper or Ask Questions

LLMGuard: Guarding Against Unsafe LLM Behavior

Feb 27, 2024

Shubh Goyal, Medha Hira, Shubham Mishra, Sukriti Goyal, Arnav Goel, Niharika Dadu, Kirushikesh DB, Sameep Mehta, Nishtha Madaan

Figure 1 for LLMGuard: Guarding Against Unsafe LLM Behavior

Figure 2 for LLMGuard: Guarding Against Unsafe LLM Behavior

Abstract:Although the rise of Large Language Models (LLMs) in enterprise settings brings new opportunities and capabilities, it also brings challenges, such as the risk of generating inappropriate, biased, or misleading content that violates regulations and can have legal concerns. To alleviate this, we present "LLMGuard", a tool that monitors user interactions with an LLM application and flags content against specific behaviours or conversation topics. To do this robustly, LLMGuard employs an ensemble of detectors.

* accepted in demonstration track of AAAI-24

Via

Access Paper or Ask Questions

Advancements in Scientific Controllable Text Generation Methods

Jul 08, 2023

Arnav Goel, Medha Hira, Avinash Anand, Siddhesh Bangar, Dr. Rajiv Ratn Shah

Abstract:The previous work on controllable text generation is organized using a new schema we provide in this study. Seven components make up the schema, and each one is crucial to the creation process. To accomplish controlled generation for scientific literature, we describe the various modulation strategies utilised to modulate each of the seven components. We also offer a theoretical study and qualitative examination of these methods. This insight makes possible new architectures based on combinations of these components. Future research will compare these methods empirically to learn more about their strengths and utility.

Via

Access Paper or Ask Questions