Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sujitha Sathiyamoorthy

Dept of Computer Science & Engineering, Indian Institute of Technology Madras, Chennai, India

A Unified Framework for Collecting Text-to-Speech Synthesis Datasets for 22 Indian Languages

Oct 18, 2024

Sujitha Sathiyamoorthy, N Mohana, Anusha Prakash, Hema A Murthy

Figure 1 for A Unified Framework for Collecting Text-to-Speech Synthesis Datasets for 22 Indian Languages

Figure 2 for A Unified Framework for Collecting Text-to-Speech Synthesis Datasets for 22 Indian Languages

Figure 3 for A Unified Framework for Collecting Text-to-Speech Synthesis Datasets for 22 Indian Languages

Figure 4 for A Unified Framework for Collecting Text-to-Speech Synthesis Datasets for 22 Indian Languages

Abstract:The performance of a text-to-speech (TTS) synthesis model depends on various factors, of which the quality of the training data is of utmost importance. Millions of data are collected around the globe for various languages, but resources for Indian languages are few. Although there are many efforts involved in data collection, a common set of protocols for data collection becomes necessary for building TTS systems in Indian languages primarily because of the need for a uniform development of TTS systems across languages. In this paper, we present our learnings on data collection efforts' for Indic languages over 15 years. These databases have been used in unit selection synthesis, hidden Markov model based, and end-to-end frameworks, and for generating prosodically rich TTS systems. The most significant feature of the data collected is that data purity enables building high-quality TTS systems with a comparatively small dataset compared to that of European/Chinese languages.

* Submitted to ICASSP 2025

Via

Access Paper or Ask Questions

Everyday Speech in the Indian Subcontinent

Oct 14, 2024

Utkarsh Pathak, Chandra Sai Krishna Gunda, Sujitha Sathiyamoorthy, Keshav Agarwal, Hema A. Murthy

Abstract:India has 1369 languages of which 22 are official. About 13 different scripts are used to represent these languages. A Common Label Set (CLS) was developed based on phonetics to address the issue of large vocabulary of units required in the End to End (E2E) framework for multilingual synthesis. This reduced the footprint of the synthesizer and also enabled fast adaptation to new languages which had similar phonotactics, provided language scripts belonged to the same family. In this paper, we provide new insights into speech synthesis, where the script belongs to one family, while the phonotactics comes from another. Indian language text is first converted to CLS, and then a synthesizer that matches the phonotactics of the language is used. Quality akin to that of a native speaker is obtained for Sanskrit and Konkani with zero adaptation data, using Kannada and Marathi synthesizers respectively. Further, this approach also lends itself seamless code switching across 13 Indian languages and English in a given native speaker's voice.

* 5 Pages, 1 Figure, Submitted to ICASSP 2025

Via

Access Paper or Ask Questions