Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Prakash B. Pimpale

Machine Translation in Indian Languages: Challenges and Resolution

Aug 01, 2018

Raj Nath Patel, Prakash B. Pimpale, M Sasikumar

Figure 1 for Machine Translation in Indian Languages: Challenges and Resolution

Figure 2 for Machine Translation in Indian Languages: Challenges and Resolution

Figure 3 for Machine Translation in Indian Languages: Challenges and Resolution

Figure 4 for Machine Translation in Indian Languages: Challenges and Resolution

Abstract:English to Indian language machine translation poses the challenge of structural and morphological divergence. This paper describes English to Indian language statistical machine translation using pre-ordering and suffix separation. The pre-ordering uses rules to transfer the structure of the source sentences prior to training and translation. This syntactic restructuring helps statistical machine translation to tackle the structural divergence and hence better translation quality. The suffix separation is used to tackle the morphological divergence between English and highly agglutinative Indian languages. We demonstrate that the use of pre-ordering and suffix separation helps in improving the quality of English to Indian Language machine translation.

* 11 pages journal paper

Via

Access Paper or Ask Questions

Recurrent Neural Network based Part-of-Speech Tagger for Code-Mixed Social Media Text

Nov 16, 2016

Raj Nath Patel, Prakash B. Pimpale, M Sasikumar

Figure 1 for Recurrent Neural Network based Part-of-Speech Tagger for Code-Mixed Social Media Text

Figure 2 for Recurrent Neural Network based Part-of-Speech Tagger for Code-Mixed Social Media Text

Abstract:This paper describes Centre for Development of Advanced Computing's (CDACM) submission to the shared task-'Tool Contest on POS tagging for Code-Mixed Indian Social Media (Facebook, Twitter, and Whatsapp) Text', collocated with ICON-2016. The shared task was to predict Part of Speech (POS) tag at word level for a given text. The code-mixed text is generated mostly on social media by multilingual users. The presence of the multilingual words, transliterations, and spelling variations make such content linguistically complex. In this paper, we propose an approach to POS tag code-mixed social media text using Recurrent Neural Network Language Model (RNN-LM) architecture. We submitted the results for Hindi-English (hi-en), Bengali-English (bn-en), and Telugu-English (te-en) code-mixed data.

* In Proceedings of the Tool Contest on POS tagging for Indian Social Media Text, ICON 2016
* 7 pages, Published at the Tool Contest on POS tagging for Indian Social Media Text, ICON 2016

Via

Access Paper or Ask Questions

Experiments with POS Tagging Code-mixed Indian Social Media Text

Oct 31, 2016

Prakash B. Pimpale, Raj Nath Patel

Figure 1 for Experiments with POS Tagging Code-mixed Indian Social Media Text

Figure 2 for Experiments with POS Tagging Code-mixed Indian Social Media Text

Figure 3 for Experiments with POS Tagging Code-mixed Indian Social Media Text

Abstract:This paper presents Centre for Development of Advanced Computing Mumbai's (CDACM) submission to the NLP Tools Contest on Part-Of-Speech (POS) Tagging For Code-mixed Indian Social Media Text (POSCMISMT) 2015 (collocated with ICON 2015). We submitted results for Hindi (hi), Bengali (bn), and Telugu (te) languages mixed with English (en). In this paper, we have described our approaches to the POS tagging techniques, we exploited for this task. Machine learning has been used to POS tag the mixed language text. For POS tagging, distributed representations of words in vector space (word2vec) for feature extraction and Log-linear models have been tried. We report our work on all three languages hi, bn, and te mixed with en.

* In the Proceedings of the 12th International Conference on Natural Language Processing (ICON 2015)
* 3 Pages, Published in the Proceedings of the Tool Contest on POS Tagging for Code-mixed Indian Social Media (Facebook, Twitter, and Whatsapp) Text

Via

Access Paper or Ask Questions

Statistical Machine Translation for Indian Languages: Mission Hindi 2

Oct 25, 2016

Raj Nath Patel, Prakash B. Pimpale

Figure 1 for Statistical Machine Translation for Indian Languages: Mission Hindi 2

Figure 2 for Statistical Machine Translation for Indian Languages: Mission Hindi 2

Figure 3 for Statistical Machine Translation for Indian Languages: Mission Hindi 2

Abstract:This paper presents Centre for Development of Advanced Computing Mumbai's (CDACM) submission to NLP Tools Contest on Statistical Machine Translation in Indian Languages (ILSMT) 2015 (collocated with ICON 2015). The aim of the contest was to collectively explore the effectiveness of Statistical Machine Translation (SMT) while translating within Indian languages and between English and Indian languages. In this paper, we report our work on all five language pairs, namely Bengali-Hindi (\bnhi), Marathi-Hindi (\mrhi), Tamil-Hindi (\tahi), Telugu-Hindi (\tehi), and English-Hindi (\enhi) for Health, Tourism, and General domains. We have used suffix separation, compound splitting and preordering prior to SMT training and testing.

* In the Proceedings of the 12th International Conference on Natural Language Processing (ICON 2015)
* 4 pages, Published in the Proceedings of NLP Tools Contest: Statistical Machine Translation in Indian Languages

Via

Access Paper or Ask Questions

Reordering rules for English-Hindi SMT

Oct 24, 2016

Raj Nath Patel, Rohit Gupta, Prakash B. Pimpale, Sasikumar M

Figure 1 for Reordering rules for English-Hindi SMT

Figure 2 for Reordering rules for English-Hindi SMT

Figure 3 for Reordering rules for English-Hindi SMT

Figure 4 for Reordering rules for English-Hindi SMT

Abstract:Reordering is a preprocessing stage for Statistical Machine Translation (SMT) system where the words of the source sentence are reordered as per the syntax of the target language. We are proposing a rich set of rules for better reordering. The idea is to facilitate the training process by better alignments and parallel phrase extraction for a phrase-based SMT system. Reordering also helps the decoding process and hence improving the machine translation quality. We have observed significant improvements in the translation quality by using our approach over the baseline SMT. We have used BLEU, NIST, multi-reference word error rate, multi-reference position independent error rate for judging the improvements. We have exploited open source SMT toolkit MOSES to develop the system.

* 8 pages, Published at the Second Workshop on Hybrid Approaches to Translation, ACL 2013

Via

Access Paper or Ask Questions

Statistical Machine Translation for Indian Languages: Mission Hindi

Oct 24, 2016

Raj Nath Patel, Prakash B. Pimpale, Sasikumar M

Figure 1 for Statistical Machine Translation for Indian Languages: Mission Hindi

Figure 2 for Statistical Machine Translation for Indian Languages: Mission Hindi

Figure 3 for Statistical Machine Translation for Indian Languages: Mission Hindi

Figure 4 for Statistical Machine Translation for Indian Languages: Mission Hindi

Abstract:This paper discusses Centre for Development of Advanced Computing Mumbai's (CDACM) submission to the NLP Tools Contest on Statistical Machine Translation in Indian Languages (ILSMT) 2014 (collocated with ICON 2014). The objective of the contest was to explore the effectiveness of Statistical Machine Translation (SMT) for Indian language to Indian language and English-Hindi machine translation. In this paper, we have proposed that suffix separation and word splitting for SMT from agglutinative languages to Hindi significantly improves over the baseline (BL). We have also shown that the factored model with reordering outperforms the phrase-based SMT for English-Hindi (\enhi). We report our work on all five pairs of languages, namely Bengali-Hindi (\bnhi), Marathi-Hindi (\mrhi), Tamil-Hindi (\tahi), Telugu-Hindi (\tehi), and \enhi for Health, Tourism, and General domains.

* 5 pages, Published at NLP Tools Contest: Statistical Machine Translation in Indian Languages, ICON-2015

Via

Access Paper or Ask Questions