Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Noaima Bari

Language Lexicons for Hindi-English Multilingual Text Processing

Jun 29, 2021

Mohd Zeeshan Ansari, Tanvir Ahmad, Noaima Bari

Figure 1 for Language Lexicons for Hindi-English Multilingual Text Processing

Figure 2 for Language Lexicons for Hindi-English Multilingual Text Processing

Abstract:Language Identification in textual documents is the process of automatically detecting the language contained in a document based on its content. The present Language Identification techniques presume that a document contains text in one of the fixed set of languages, however, this presumption is incorrect when dealing with multilingual document which includes content in more than one possible language. Due to the unavailability of large standard corpora for Hindi-English mixed lingual language processing tasks we propose the language lexicons, a novel kind of lexical database that supports several multilingual language processing tasks. These lexicons are built by learning classifiers over transliterated Hindi and English vocabulary. The designed lexicons possess richer quantitative characteristic than its primary source of collection which is revealed using the visualization techniques.

Via

Access Paper or Ask Questions