Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:RoDia: A New Dataset for Romanian Dialect Identification from Speech

Sep 12, 2023

Codrut Rotaru, Nicolae-Catalin Ristea, Radu Tudor Ionescu

Figure 1 for RoDia: A New Dataset for Romanian Dialect Identification from Speech

Figure 2 for RoDia: A New Dataset for Romanian Dialect Identification from Speech

Figure 3 for RoDia: A New Dataset for Romanian Dialect Identification from Speech

Figure 4 for RoDia: A New Dataset for Romanian Dialect Identification from Speech

Share this with someone who'll enjoy it:

Abstract:Dialect identification is a critical task in speech processing and language technology, enhancing various applications such as speech recognition, speaker verification, and many others. While most research studies have been dedicated to dialect identification in widely spoken languages, limited attention has been given to dialect identification in low-resource languages, such as Romanian. To address this research gap, we introduce RoDia, the first dataset for Romanian dialect identification from speech. The RoDia dataset includes a varied compilation of speech samples from five distinct regions of Romania, covering both urban and rural environments, totaling 2 hours of manually annotated speech data. Along with our dataset, we introduce a set of competitive models to be used as baselines for future research. The top scoring model achieves a macro F1 score of 59.83% and a micro F1 score of 62.08%, indicating that the task is challenging. We thus believe that RoDia is a valuable resource that will stimulate research aiming to address the challenges of Romanian dialect identification. We publicly release our dataset and code at https://github.com/codrut2/RoDia.

View paper on

Share this with someone who'll enjoy it:

Title:RoDia: A New Dataset for Romanian Dialect Identification from Speech

Paper and Code