Abstract:This paper presents a mechanism of resolving unidentified lexical units in Text-based Machine Translation (TBMT). In a Machine Translation (MT) system it is unlikely to have a complete lexicon and hence there is intense need of a new mechanism to handle the problem of unidentified words. These unknown words could be abbreviations, names, acronyms and newly introduced terms. We have proposed an algorithm for the resolution of the unidentified words. This algorithm takes discourse unit (primitive discourse) as a unit of analysis and provides real time updates to the lexicon. We have manually applied the algorithm to news paper fragments. Along with anaphora and cataphora resolution, many unknown words especially names and abbreviations were updated to the lexicon.
Abstract:This paper presents a theoretical research based approach to ellipsis resolution in machine translation. The formula of discourse is applied in order to resolve ellipses. The validity of the discourse formula is analyzed by applying it to the real world text, i.e., newspaper fragments. The source text is converted into mono-sentential discourses where complex discourses require further dissection either directly into primitive discourses or first into compound discourses and later into primitive ones. The procedure of dissection needs further improvement, i.e., discovering as many primitive discourse forms as possible. An attempt has been made to investigate new primitive discourses or patterns from the given text.