Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Preparing Bengali-English Code-Mixed Corpus for Sentiment Analysis of Indian Languages

Mar 11, 2018

Soumil Mandal, Sainik Kumar Mahata, Dipankar Das

Figure 1 for Preparing Bengali-English Code-Mixed Corpus for Sentiment Analysis of Indian Languages

Figure 2 for Preparing Bengali-English Code-Mixed Corpus for Sentiment Analysis of Indian Languages

Figure 3 for Preparing Bengali-English Code-Mixed Corpus for Sentiment Analysis of Indian Languages

Figure 4 for Preparing Bengali-English Code-Mixed Corpus for Sentiment Analysis of Indian Languages

Share this with someone who'll enjoy it:

Abstract:Analysis of informative contents and sentiments of social users has been attempted quite intensively in the recent past. Most of the systems are usable only for monolingual data and fails or gives poor results when used on data with code-mixing property. To gather attention and encourage researchers to work on this crisis, we prepared gold standard Bengali-English code-mixed data with language and polarity tag for sentiment analysis purposes. In this paper, we discuss the systems we prepared to collect and filter raw Twitter data. In order to reduce manual work while annotation, hybrid systems combining rule based and supervised models were developed for both language and sentiment tagging. The final corpus was annotated by a group of annotators following a few guidelines. The gold standard corpus thus obtained has impressive inter-annotator agreement obtained in terms of Kappa values. Various metrics like Code-Mixed Index (CMI), Code-Mixed Factor (CF) along with various aspects (language and emotion) also qualitatively polled the code-mixed and sentiment properties of the corpus.

* The 13th Workshop on Asian Language Resources (ALR), collocated with LREC 2018

View paper on

Share this with someone who'll enjoy it:

Title:Preparing Bengali-English Code-Mixed Corpus for Sentiment Analysis of Indian Languages

Paper and Code