Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

M Zeeshan Ansari

A Simple and Efficient Probabilistic Language model for Code-Mixed Text

Jun 29, 2021

M Zeeshan Ansari, Tanvir Ahmad, M M Sufyan Beg, Asma Ikram

Figure 1 for A Simple and Efficient Probabilistic Language model for Code-Mixed Text

Figure 2 for A Simple and Efficient Probabilistic Language model for Code-Mixed Text

Figure 3 for A Simple and Efficient Probabilistic Language model for Code-Mixed Text

Figure 4 for A Simple and Efficient Probabilistic Language model for Code-Mixed Text

Abstract:The conventional natural language processing approaches are not accustomed to the social media text due to colloquial discourse and non-homogeneous characteristics. Significantly, the language identification in a multilingual document is ascertained to be a preceding subtask in several information extraction applications such as information retrieval, named entity recognition, relation extraction, etc. The problem is often more challenging in code-mixed documents wherein foreign languages words are drawn into base language while framing the text. The word embeddings are powerful language modeling tools for representation of text documents useful in obtaining similarity between words or documents. We present a simple probabilistic approach for building efficient word embedding for code-mixed text and exemplifying it over language identification of Hindi-English short test messages scrapped from Twitter. We examine its efficacy for the classification task using bidirectional LSTMs and SVMs and observe its improved scores over various existing code-mixed embeddings

Via

Access Paper or Ask Questions