Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization

Nov 16, 2024

Md. Nazmus Sadat Samin, Jawad Ibn Ahad, Tanjila Ahmed Medha, Fuad Rahman, Mohammad Ruhul Amin, Nabeel Mohammed, Shafin Rahman

Figure 1 for BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization

Figure 2 for BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization

Figure 3 for BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization

Figure 4 for BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization

Share this with someone who'll enjoy it:

Abstract:This study focuses on recognizing Bangladeshi dialects and converting diverse Bengali accents into standardized formal Bengali speech. Dialects, often referred to as regional languages, are distinctive variations of a language spoken in a particular location and are identified by their phonetics, pronunciations, and lexicon. Subtle changes in pronunciation and intonation are also influenced by geographic location, educational attainment, and socioeconomic status. Dialect standardization is needed to ensure effective communication, educational consistency, access to technology, economic opportunities, and the preservation of linguistic resources while respecting cultural diversity. Being the fifth most spoken language with around 55 distinct dialects spoken by 160 million people, addressing Bangla dialects is crucial for developing inclusive communication tools. However, limited research exists due to a lack of comprehensive datasets and the challenges of handling diverse dialects. With the advancement in multilingual Large Language Models (mLLMs), emerging possibilities have been created to address the challenges of dialectal Automated Speech Recognition (ASR) and Machine Translation (MT). This study presents an end-to-end pipeline for converting dialectal Noakhali speech to standard Bangla speech. This investigation includes constructing a large-scale diverse dataset with dialectal speech signals that tailored the fine-tuning process in ASR and LLM for transcribing the dialect speech to dialect text and translating the dialect text to standard Bangla text. Our experiments demonstrated that fine-tuning the Whisper ASR model achieved a CER of 0.8% and WER of 1.5%, while the BanglaT5 model attained a BLEU score of 41.6% for dialect-to-standard text translation.

* Accepted in 2024 IEEE International Conference on Big Data (IEEE BigData)

View paper on

Share this with someone who'll enjoy it:

Title:BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization

Paper and Code