Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Richard Yue

On Translating Technical Terminology: A Translation Workflow for Machine-Translated Acronyms

Sep 26, 2024

Richard Yue, John E. Ortega, Kenneth Ward Church

Abstract:The typical workflow for a professional translator to translate a document from its source language (SL) to a target language (TL) is not always focused on what many language models in natural language processing (NLP) do - predict the next word in a series of words. While high-resource languages like English and French are reported to achieve near human parity using common metrics for measurement such as BLEU and COMET, we find that an important step is being missed: the translation of technical terms, specifically acronyms. Some state-of-the art machine translation systems like Google Translate which are publicly available can be erroneous when dealing with acronyms - as much as 50% in our findings. This article addresses acronym disambiguation for MT systems by proposing an additional step to the SL-TL (FR-EN) translation workflow where we first offer a new acronym corpus for public consumption and then experiment with a search-based thresholding algorithm that achieves nearly 10% increase when compared to Google Translate and OpusMT.

* AMTA 2024 - The Association for Machine Translation in the Americas organizes biennial conferences devoted to researchers, commercial users, governmental and NGO users

Via

Access Paper or Ask Questions

Predicting Anchored Text from Translation Memories for Machine Translation Using Deep Learning Methods

Sep 26, 2024

Richard Yue, John E. Ortega

Figure 1 for Predicting Anchored Text from Translation Memories for Machine Translation Using Deep Learning Methods

Figure 2 for Predicting Anchored Text from Translation Memories for Machine Translation Using Deep Learning Methods

Figure 3 for Predicting Anchored Text from Translation Memories for Machine Translation Using Deep Learning Methods

Figure 4 for Predicting Anchored Text from Translation Memories for Machine Translation Using Deep Learning Methods

Abstract:Translation memories (TMs) are the backbone for professional translation tools called computer-aided translation (CAT) tools. In order to perform a translation using a CAT tool, a translator uses the TM to gather translations similar to the desired segment to translate (s'). Many CAT tools offer a fuzzy-match algorithm to locate segments (s) in the TM that are close in distance to s'. After locating two similar segments, the CAT tool will present parallel segments (s, t) that contain one segment in the source language along with its translation in the target language. Additionally, CAT tools contain fuzzy-match repair (FMR) techniques that will automatically use the parallel segments from the TM to create new TM entries containing a modified version of the original with the idea in mind that it will be the translation of s'. Most FMR techniques use machine translation as a way of "repairing" those words that have to be modified. In this article, we show that for a large part of those words which are anchored, we can use other techniques that are based on machine learning approaches such as Word2Vec. BERT, and even ChatGPT. Specifically, we show that for anchored words that follow the continuous bag-of-words (CBOW) paradigm, Word2Vec, BERT, and GPT-4 can be used to achieve similar and, for some cases, better results than neural machine translation for translating anchored words from French to English.

* AMTA 2024 - The Association for Machine Translation in the Americas organizes biennial conferences devoted to researchers, commercial users, governmental and NGO users

Via

Access Paper or Ask Questions