Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

M. Anand Kumar

A new TAG Formalism for Tamil and Parser Analytics

Apr 05, 2016

Vijay Krishna Menon, S. Rajendran, M. Anand Kumar, K. P. Soman

Figure 1 for A new TAG Formalism for Tamil and Parser Analytics

Figure 2 for A new TAG Formalism for Tamil and Parser Analytics

Figure 3 for A new TAG Formalism for Tamil and Parser Analytics

Figure 4 for A new TAG Formalism for Tamil and Parser Analytics

Abstract:Tree adjoining grammar (TAG) is specifically suited for morph rich and agglutinated languages like Tamil due to its psycho linguistic features and parse time dependency and morph resolution. Though TAG and LTAG formalisms have been known for about 3 decades, efforts on designing TAG Syntax for Tamil have not been entirely successful due to the complexity of its specification and the rich morphology of Tamil language. In this paper we present a minimalistic TAG for Tamil without much morphological considerations and also introduce a parser implementation with some obvious variations from the XTAG system

* International Symposium for Dravidian Languages (iDravidian), co-located with ICON2014, Goa University, Dec 2014

Via

Access Paper or Ask Questions

Improving the Performance of English-Tamil Statistical Machine Translation System using Source-Side Pre-Processing

Sep 29, 2014

M. Anand Kumar, V. Dhanalakshmi, K. P. Soman, V. Sharmiladevi

Figure 1 for Improving the Performance of English-Tamil Statistical Machine Translation System using Source-Side Pre-Processing

Figure 2 for Improving the Performance of English-Tamil Statistical Machine Translation System using Source-Side Pre-Processing

Figure 3 for Improving the Performance of English-Tamil Statistical Machine Translation System using Source-Side Pre-Processing

Figure 4 for Improving the Performance of English-Tamil Statistical Machine Translation System using Source-Side Pre-Processing

Abstract:Machine Translation is one of the major oldest and the most active research area in Natural Language Processing. Currently, Statistical Machine Translation (SMT) dominates the Machine Translation research. Statistical Machine Translation is an approach to Machine Translation which uses models to learn translation patterns directly from data, and generalize them to translate a new unseen text. The SMT approach is largely language independent, i.e. the models can be applied to any language pair. Statistical Machine Translation (SMT) attempts to generate translations using statistical methods based on bilingual text corpora. Where such corpora are available, excellent results can be attained translating similar texts, but such corpora are still not available for many language pairs. Statistical Machine Translation systems, in general, have difficulty in handling the morphology on the source or the target side especially for morphologically rich languages. Errors in morphology or syntax in the target language can have severe consequences on meaning of the sentence. They change the grammatical function of words or the understanding of the sentence through the incorrect tense information in verb. Baseline SMT also known as Phrase Based Statistical Machine Translation (PBSMT) system does not use any linguistic information and it only operates on surface word form. Recent researches shown that adding linguistic information helps to improve the accuracy of the translation with less amount of bilingual corpora. Adding linguistic information can be done using the Factored Statistical Machine Translation system through pre-processing steps. This paper investigates about how English side pre-processing is used to improve the accuracy of English-Tamil SMT system.

* Proc. of Int. Conf. on Advances in Computer Science, AETACS - 2013

Via

Access Paper or Ask Questions