Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Aksharantar: Towards building open transliteration tools for the next billion users

May 06, 2022

Yash Madhani, Sushane Parthan, Priyanka Bedekar, Ruchi Khapra, Vivek Seshadri, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra

Figure 1 for Aksharantar: Towards building open transliteration tools for the next billion users

Figure 2 for Aksharantar: Towards building open transliteration tools for the next billion users

Figure 3 for Aksharantar: Towards building open transliteration tools for the next billion users

Figure 4 for Aksharantar: Towards building open transliteration tools for the next billion users

Share this with someone who'll enjoy it:

Abstract:We introduce Aksharantar, the largest publicly available transliteration dataset for 21 Indic languages containing 26 million transliteration pairs. We build this dataset by mining transliteration pairs from large monolingual and parallel corpora, as well as collecting transliterations from human annotators to ensure diversity of words and representation of low-resource languages. We introduce a new, large, diverse testset for Indic language transliteration containing 103k words pairs spanning 19 languages that enables fine-grained analysis of transliteration models. We train the IndicXlit model on the Aksharantar training set. IndicXlit is a single transformer-based multilingual transliteration model for roman to Indic script conversion supporting 21 Indic languages. It achieves state-of-the art results on the Dakshina testset, and establishes strong baselines on the Aksharantar testset released along with this work. All the datasets and models are publicly available at https://indicnlp.ai4bharat.org/aksharantar. We hope the availability of these large-scale, open resources will spur innovation for Indic language transliteration and downstream applications.

* 19 pages, 17 tables, 1 figure

View paper on

Share this with someone who'll enjoy it:

Title:Aksharantar: Towards building open transliteration tools for the next billion users

Paper and Code