Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ramchandra Joshi

Evaluation of Deep Learning Models for Hostility Detection in Hindi Text

Jan 13, 2021

Ramchandra Joshi, Rushabh Karnavat, Kaustubh Jirapure, Raviraj Joshi

Figure 1 for Evaluation of Deep Learning Models for Hostility Detection in Hindi Text

Figure 2 for Evaluation of Deep Learning Models for Hostility Detection in Hindi Text

Figure 3 for Evaluation of Deep Learning Models for Hostility Detection in Hindi Text

Figure 4 for Evaluation of Deep Learning Models for Hostility Detection in Hindi Text

Abstract:The social media platform is a convenient medium to express personal thoughts and share useful information. It is fast, concise, and has the ability to reach millions. It is an effective place to archive thoughts, share artistic content, receive feedback, promote products, etc. Despite having numerous advantages these platforms have given a boost to hostile posts. Hate speech and derogatory remarks are being posted for personal satisfaction or political gain. The hostile posts can have a bullying effect rendering the entire platform experience hostile. Therefore detection of hostile posts is important to maintain social media hygiene. The problem is more pronounced languages like Hindi which are low in resources. In this work, we present approaches for hostile text detection in the Hindi language. The proposed approaches are evaluated on the Constraint@AAAI 2021 Hindi hostility detection dataset. The dataset consists of hostile and non-hostile texts collected from social media platforms. The hostile posts are further segregated into overlapping classes of fake, offensive, hate, and defamation. We evaluate a host of deep learning approaches based on CNN, LSTM, and BERT for this multi-label classification problem. The pre-trained Hindi fast text word embeddings by IndicNLP and Facebook are used in conjunction with CNN and LSTM models. Two variations of pre-trained multilingual transformer language models mBERT and IndicBERT are used. We show that the performance of BERT based models is best. Moreover, CNN and LSTM models also perform competitively with BERT based models.

Via

Access Paper or Ask Questions

Domain Adaptation of NMT models for English-Hindi Machine Translation Task at AdapMT ICON 2020

Dec 23, 2020

Ramchandra Joshi, Rushabh Karnavat, Kaustubh Jirapure, Raviraj Joshi

Figure 1 for Domain Adaptation of NMT models for English-Hindi Machine Translation Task at AdapMT ICON 2020

Figure 2 for Domain Adaptation of NMT models for English-Hindi Machine Translation Task at AdapMT ICON 2020

Figure 3 for Domain Adaptation of NMT models for English-Hindi Machine Translation Task at AdapMT ICON 2020

Figure 4 for Domain Adaptation of NMT models for English-Hindi Machine Translation Task at AdapMT ICON 2020

Abstract:Recent advancements in Neural Machine Translation (NMT) models have proved to produce a state of the art results on machine translation for low resource Indian languages. This paper describes the neural machine translation systems for the English-Hindi language presented in AdapMT Shared Task ICON 2020. The shared task aims to build a translation system for Indian languages in specific domains like Artificial Intelligence (AI) and Chemistry using a small in-domain parallel corpus. We evaluated the effectiveness of two popular NMT models i.e, LSTM, and Transformer architectures for the English-Hindi machine translation task based on BLEU scores. We train these models primarily using the out of domain data and employ simple domain adaptation techniques based on the characteristics of the in-domain dataset. The fine-tuning and mixed-domain data approaches are used for domain adaptation. Our team was ranked first in the chemistry and general domain En-Hi translation task and second in the AI domain En-Hi translation task.

* Accepted at AdaptMT ICON 2020

Via

Access Paper or Ask Questions

Evaluating Input Representation for Language Identification in Hindi-English Code Mixed Text

Nov 25, 2020

Ramchandra Joshi, Raviraj Joshi

Figure 1 for Evaluating Input Representation for Language Identification in Hindi-English Code Mixed Text

Figure 2 for Evaluating Input Representation for Language Identification in Hindi-English Code Mixed Text

Abstract:Natural language processing (NLP) techniques have become mainstream in the recent decade. Most of these advances are attributed to the processing of a single language. More recently, with the extensive growth of social media platforms focus has shifted to code-mixed text. The code-mixed text comprises text written in more than one language. People naturally tend to combine local language with global languages like English. To process such texts, current NLP techniques are not sufficient. As a first step, the text is processed to identify the language of the words in the text. In this work, we focus on language identification in code-mixed sentences for Hindi-English mixed text. The task of language identification is formulated as a token classification task. In the supervised setting, each word in the sentence has an associated language label. We evaluate different deep learning models and input representation combinations for this task. Mainly, character, sub-word, and word embeddings are considered in combination with CNN and LSTM based models. We show that sub-word representation along with the LSTM model gives the best results. In general sub-word representations perform significantly better than other input representations. We report the best accuracy of 94.52% using a single layer LSTM model on the standard SAIL ICON 2017 test set.

* Accepted at ICDSMLA 2020

Via

Access Paper or Ask Questions

Deep Learning for Hindi Text Classification: A Comparison

Jan 19, 2020

Ramchandra Joshi, Purvi Goel, Raviraj Joshi

Figure 1 for Deep Learning for Hindi Text Classification: A Comparison

Abstract:Natural Language Processing (NLP) and especially natural language text analysis have seen great advances in recent times. Usage of deep learning in text processing has revolutionized the techniques for text processing and achieved remarkable results. Different deep learning architectures like CNN, LSTM, and very recent Transformer have been used to achieve state of the art results variety on NLP tasks. In this work, we survey a host of deep learning architectures for text classification tasks. The work is specifically concerned with the classification of Hindi text. The research in the classification of morphologically rich and low resource Hindi language written in Devanagari script has been limited due to the absence of large labeled corpus. In this work, we used translated versions of English data-sets to evaluate models based on CNN, LSTM and Attention. Multilingual pre-trained sentence embeddings based on BERT and LASER are also compared to evaluate their effectiveness for the Hindi language. The paper also serves as a tutorial for popular text classification techniques.

* Accepted at International Conference on Intelligent Human Computer Interaction(IHCI) 2019

Via

Access Paper or Ask Questions