Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sai Akarsh

VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech

Jun 12, 2024

Ashishkumar Gudmalwar, Nirmesh Shah, Sai Akarsh, Pankaj Wasnik, Rajiv Ratn Shah

Figure 1 for VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech

Figure 2 for VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech

Figure 3 for VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech

Abstract:Despite the significant advancements in Text-to-Speech (TTS) systems, their full utilization in automatic dubbing remains limited. This task necessitates the extraction of voice identity and emotional style from a reference speech in a source language and subsequently transferring them to a target language using cross-lingual TTS techniques. While previous approaches have mainly concentrated on controlling voice identity within the cross-lingual TTS framework, there has been limited work on incorporating emotion and voice identity together. To this end, we introduce an end-to-end Voice Identity and Emotional Style Controllable Cross-Lingual (VECL) TTS system using multilingual speakers and an emotion embedding network. Moreover, we introduce content and style consistency losses to enhance the quality of synthesized speech further. The proposed system achieved an average relative improvement of 8.83\% compared to the state-of-the-art (SOTA) methods on a database comprising English and three Indian languages (Hindi, Telugu, and Marathi).

* Accepted at INTERSPEECH 2024

Via

Access Paper or Ask Questions

Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation

Mar 07, 2024

Sai Akarsh, Vamshi Raghusimha, Anindita Mondal, Anil Vuppala

Figure 1 for Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation

Figure 2 for Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation

Figure 3 for Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation

Figure 4 for Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation

Abstract:The language diversity in India's education sector poses a significant challenge, hindering inclusivity. Despite the democratization of knowledge through online educational content, the dominance of English, as the internet's lingua franca, limits accessibility, emphasizing the crucial need for translation into Indian languages. Despite existing Speech-to-Speech Machine Translation (SSMT) technologies, the lack of intonation in these systems gives monotonous translations, leading to a loss of audience interest and disengagement from the content. To address this, our paper introduces a dataset with stress annotations in Indian English and also a Text-to-Speech (TTS) architecture capable of incorporating stress into synthesized speech. This dataset is used for training a stress detection model, which is then used in the SSMT system for detecting stress in the source speech and transferring it into the target language speech. The TTS architecture is based on FastPitch and can modify the variances based on stressed words given. We present an Indian English-to-Hindi SSMT system that can transfer stress and aim to enhance the overall quality and engagement of educational content.

Via

Access Paper or Ask Questions